On October 15 and 16, I was lucky enough to be a part of the Midwest Data Librarian Symposium as a participant and facilitator. The Symposium was organized by Kristin Briney and Brianna Marshall and held in Milwaukee, WI, at the University of Wisconsin-Milwaukee Golda Meir Library. The Symposium was developed with the idea of giving participants lots of discussion time, but also, some concrete ideas to take home.
The good news is, the slides, handouts, working documents, etc. are available online for everyone to use. Just look for the links in the Symposium schedule to view the materials. A more formal collection of these materials will be deposited in the University of Wisconsin-Milwaukee repository, so keep an eye out for that link.
The Symposium started with a full morning workshop on Data Management Teaching Materials led by Lisa Johnston. Participants were asked to submit their favorite teaching slide, idea, trick, etc. ahead of time, so each person had a chance to present during the workshop. We started out with introductions and then Lisa had us all arrange ourselves around the room by different criteria – distance travelled, experience level, data specialty, etc. Each change of position allowed us to meet new people before settling down to learn. Lisa started out by introducing the concept of backwards design, defining the goals, i.e. what do you want students to learn, then defining acceptable evidence that the goals have been met, then finally, creating an instruction session to accomplish the goal. With that in mind, the group presented their favorite ideas.The exchange of ideas was wonderful, from tools for risk analysis, checklists, thinking exercises, memes, and more.
The afternoon discussion topic was Consulting on Data, led by Cynthia Hudson-Vitale. Cynthia based her discussion format on the World Cafe model. There were 5 topics that data librarians might be consulted about:
- Finding data
- Data management plans/funding requirements
- Data visualization
- Data archiving, preservation, sharing
We thought about these topics with regard to marketing/initiating contacts, workflows, and follow up/assessment – and within these subtopics, tried to think about methods, outcomes, and strategies for addressing challenges. People were able to choose their topics(s) of interest, and move around as sub-topics changed. As we moved around, a table leader helped us bring together ideas for getting the most out of consultations in those areas. Once again,lots of great ideas – but sure to check out the link to the group notes at the end of the Session Plan.
The next day, we started with my session on Data Curation. I put out some mini-scenarios to get people thinking, and had people put stickers on the scenarios they thought warranted saving. (listed in the Session Plan). People moved into discussion groups that didn’t involve colleagues, so there was more chance to learn about what others were doing. We also had some enthusiastic LIS students, and I asked them to split up as well. After giving broad definitions of curation and some basic appraisal criteria, I had the groups discuss curation and report back on legal or funder issues, curation policies, and proper documentation. We returned to the scenarios after discussing data curation and generally there were fewer datasets that people felt needed to be saved. I think we ended up with more questions than answers, but that is the way it goes with data.
Our third session was led by Brianna Marshall and focused on creating elevator speeches to use when trying to bring in new partners to help with data. First we did a quick brainstorming session to think of partners (lots – see the list on the Session Plan) and then we chose 7 to work with. We all got a chance to create personas for the partner, which was enlightening. Then the group developed an elevator speech, and one group member delivered it after all the table discussions were finished. Links to the partner personas and elevator speeches are in the Session Plan as well, and I know they will be helpful to many people, because it is often hard to condense our ideas into a precise speech, plus, have a request for how that partner might help us. This exercise really pulled it all together.
Our final discussion session on Teaching Data Management was led by Heather Coates, and build upon the ideas we had gathered in Lisa Johnson’s session the day before. Heather had us think about the 12 data information literacy concepts (DIL – found in table 1 of this article http://www.ijdc.net/index.php/ijdc/article/view/8.1.204/306) and then create a sample lesson plan for teaching one or two of these concepts, based on one of the scenarios describing a class, course, or workshop. Heather separated us based on familiarity with data or teaching, or both or neither, so each group had a mix of skills to work with. The lesson plan outline provided guidance for how to structure our work. Of course, none of the groups have a totally completed lesson plan, but the great thing was, each group had different ideas how to teach, even when the group or topic was similar, so we all now have lots of good ideas to start with.
The format of the meeting encouraged lots of discussion, and each facilitator chose a different way to organize their discussions, and different types of outcomes. So not only did we all learn lots of new things about data librarianship, we also tested different ways to facilitate discussions for future teaching and events.
The symposium wrap-up was led by Jamene Brooks-Kieffer. It was an excellent way to pull together an amazing series of discussions. Jamene did the usual review of ideas, with a lovely garden metaphor to describe data services, and then she had us think about our local ‘growing conditions’ and think about the ideas we could transplant. And we had a writing exercise! For 7 minutes the whole group wrote about what we wanted to do and how we might do it when we got back to work. Then, just to make us accountable, we discussed it with one of the other participants, and hopefully, in 6 months, we’ll see if we managed to do something. Jamene also storified the tweets so you can read some of the fun https://storify.com/jbkieffer/mdls15-all-the-tweets-both-days
The great news is, there are people in many other Midwest libraries who want to make this an annual event, so be sure to keep your ears open. I crashed the symposium from Virginia (I went to school in London, Ontario so I am a displaced midwesterner) and you might want to as well. And as I mentioned, keep your eyes open for the link to all the materials in the University of Wisconsin-Milwaukee repository – you won’t be sorry.
The latest issue of the Journal of eScience Librarianship has been published! The issue’s focus is on “Targeting and Customizing Research Data Management Services (RDM).” The full issue is available at http://escholarship.umassmed.edu/jeslib/vol4/iss1/. Check it out!
Table of Contents Volume 4, Issue 1 (2015)
Differences in the Data Practices, Challenges, and Future Needs of Graduate Students and Faculty Members
Travis Weller and Amalia Monroe-Gulick
Assessment of Data Management Services at New England Region Resource Libraries
Julie Goldman, Donna Kafel, and Elaine R. Martin
EScience in Action
Examination of Federal Data Management Plan Guidelines
Jennifer L. Thoegersen
Two Upcoming Data Management Events – Both Friday, November 20th, 2015
For more information see http://classguides.lib.uconn.edu/RDMR
· Morning Event: Data Tools Forum (9:30 am – 12 Noon)
· Afternoon Event: The 2nd Research Data Management Roundtable (1:15pm – 4pm)
Topic: ”Engaging Faculty and Graduate Student Researchers at Our Institutions”
Location for both events: Faculty Conference Room, University of Massachusetts Medical School, 55 Lake Ave. North, Worcester, MA (map and directions)
Important: please register for each event if you plan to attend both!
Data Tools Forum 9:30am – 12 noon
The Data Tools Forum will feature a brief introduction on the integration of data science tools in the research landscape and overview presentations of three open-source tools commonly used in working with data extraction, wrangling, analysis, and presentation: OpenRefine, RStudio, and Jupyter. The forum will be open to both researchers and librarians. (Librarians are encouraged to forward the attached Data Tool Forum flyer to researchers at their institutions). Space is limited so please register early. For questions about registration for this event, contact Zac Painter (firstname.lastname@example.org).
Participants will have lunch on their own. Options include the Sherman Cafeteria (in the Sherman Building, adjacent to the Medical School Building) or at another off-campus restaurant (for suggestions).
Research Data Management Roundtable 1:15 – 4pm ”Engaging Faculty and Graduate Student Researchers at Our Institutions”
This is the second in a series of informal roundtable discussions on specific research data management topics. It is intended to establish and foster an active Community of Practice network for librarians working in research data management services. In this informal set of roundtable discussions participants will have the opportunity to share ideas and learn from each other. Space is limited, so please register early.
For questions about registration for the roundtable event, contact Tom Hohenstein (email@example.com).
Light refreshments will be available at each event.
Both events are sponsored by the New England e-Science Program.
One of the things I’m working on currently is developing a semester long data information literacy course with some of my colleagues for graduate students in the College of Engineering here at the University of Michigan. In constructing this course, I have been thinking about how we could incorporate ACRL’s “Framework for Information Literacy for Higher Education” (hereafter referred to as “the Framework”), particularly the idea of threshold concepts. The framework represents an effort to move beyond a prescriptive skills-based course of instruction as represented by ACRL’s 2000 “Information Literacy Competency Standards for Higher Education” and towards a less directive, student centered model of education that promotes engagement with other fields. Threshold concepts, as defined within the framework, are “those ideas in any discipline that are passageways or portals to enlarged understanding or ways of thinking and practicing within that discipline”.
The Framework is comprised of six threshold concepts that serve as its core, although the framework is not intended to be prescriptive or exhaustive. Librarians are encouraged to apply the Framework in ways that are relevant to their environment and educational objectives. It is with this encouragement in mind that I post some initial thoughts in how the Framework might inform our thinking in developing our DIL course.
- From the Framework: Authority Is Constructed and Contextual
- As applied to data: Your Data Are a Component of Your Professional Identity.
As a graduate student, you are developing your professional identity and becoming an authority in your field. The data you are generating or using in your research serves in part as an indicator of your expertise and credibility as a scholar. As such, it is important that you document and describe your data in ways that enable others to understand, evaluate and trust your work in order to build your authority in your field of research.
- From the Framework: Information Creation as a Process
- As applied to data: Data Creation as Process
The processes of acquiring, preparing, analyzing and summarizing data that are done as a part of the research process affect the utility, accessibility and potential impact of the data. For example, digital data are often migrated from one format into another to enable the data to be interpreted by a particular software package. These processes are components of a larger data lifecycle which include considerations prior to acquiring data (such as planning and discovery) as well as after summarizing the data in reporting research findings (such as dissemination and preservation of the data itself). Defining the lifecycle of your data and seeing how the stages in your lifecycle are connected is critical for understanding how actions taken in one stage may affect another, advancing or restricting what you or others are able to do with the data.
- From the Framework: Information Has Value.
- As applied to data: Data Have Value Outside of the Purpose for Which They Were Generated
Research data are generally created with the intent of addressing a particular question or understanding a particular situation. However, data can also be repurposed and used by others outside of the original researcher or team to ask new questions or support new areas of inquiry. In addition, beyond supporting new research endeavors, data can also be a commodity, a tool for education, a means to persuade opinion or a means of better understanding our world.
- From the Framework: Research as Inquiry
- For Data: Data Have Long-Term Value
Data are often defined as the building blocks of research. Research is an iterative process, where people return to previous findings to question them based on new knowledge or to pose increasing more specific or complex inquiries. As researchers return to past areas of inquiry, the data that underlie these inquiries needs to have been preserved in ways that enable others to revisit and reuse them. In developing your data set, consider the ways in which you can preserve your data to support its continued use in fueling new areas of exploration.
- From the Framework: Scholarship as Conversation
- For Data: Data as a Component of the Scholarly Communications Process
Scholarship can be understood as a discourse amongst communities of scholars, researchers, or professionals engage in which they seek to communicate their insights and perspectives to designated audiences. Data often underlie the findings, arguments or points being made in this discourse. Without access to the data it may be difficult to fully understand or trust the findings or arguments being made. In considering how to share your research, give your research data equal consideration and treatment on par with articles or other more traditional product s of scholarly communication.
- From the Framework: Searching as Strategic Exploration
- For Data: Developing the Human Readable Elements of your Data Set
Data are designed to be consumed by machines (instruments, software, etc.) rather than by humans as articles or most other publications are. However, as the consumption of data becomes increasingly important to researchers (and educators, government agencies, businesses, the general public, etc.), data producers need to consider how their data could be discovered, understood and trusted by others through providing documentation or description written clear and direct language.
This is only a preliminary exploration of how threshold concepts could potentially be used to inform teaching data information literacy. However, I believe that this is a useful area of study and I would love to see more rigorous work done to articulate further the connections between data and ACRL’s Framework for Information Literacy.
By Andrew Creamer, Scientific Data Management Specialist, Brown University Library
Tonight, On DMP, He Wrote…DoD Data Management and Research Resources Sharing Plans
Sheriff Mort Metzger: Mrs. Flecther! I said, do me a favor, please, and tell me what goes on in this town!
Jessica Fletcher: I’m sorry, but…
Sheriff Mort Metzger: I’ve been here one year, this is my fifth murder. What is this, the death capitol of Maine? On a per capita basis this place makes the south Bronx look like Sunny Brooke farms!
Jessica Fletcher: But I assure you Sheriff…
Sheriff Mort Metzger: I mean, is that why [former Sheriff] Tupper quit? He couldn’t take it anymore? Somebody really should’ve warned me, Mrs. Fletcher. Now, perfect strangers are coming to Cabot Cove to die! I mean look at this guy! You don’t know him, I don’t know him. He has no ID, we don’t know the first thing about this guy! (“Murder, She Wrote: Mirror, Mirror, on the Wall: Part 1 (#5.21)” (1989))
Like most of my colleagues in U.S. University Libraries’ research data services, I see a lot of business in helping our faculty with NSF data management plans and NIH data sharing plans. I see each new NSF/NIH proposal as an opportunity to learn new things and to help our faculty to improve and strengthen their plans. Within the NSF alone, there can be a lot of variation: a new directorate, division or office proposal that I have not worked with before or at least not very often. There can also be much variation in the type of award, such as instrumentation, training, dissertation improvement awards, CAREER, etc. That being said, I do enjoy new challenges, and when I am approached by a faculty member or student asking for help with drafting a plan for a funder that I have not had the opportunity to work with before, there is, frankly, a feeling of excitement.
Like Angela Lansbury’s Jessica Fletcher up in Cabot Cove, Maine, I take great joy in sleuthing. I snoop through the funder’s proposal guidelines, partner up with my colleague in the Office of Sponsored Projects to interrogate the program officers and come up with that ‘Eureka moment’: a draft of what’s required or recommended to include in the plan and the parameters we need to meet before I sit down to go over the facts with the faculty member.
Recently, I was asked to help out two different faculty members with their data management and sharing plans from two separate funding programs within The U.S. Department of Defense. I set to work sleuthing. The first proposal, from the Air Force Office of Scientific Research (AFOSR), was actually quite prescriptive in terms of what items to include and how long the document (2-pages) should be:
- The types of data, software, and other materials to be produced in the course of the project, with notation marking those that are publicly releasable;
- How the data will be acquired;
- Time and location of data acquisition if they are scientifically pertinent;
- How the data will be processed;
- The file formats and the naming conventions that will be used;
- A description of the quality assurance and quality control measures during collection, analysis, and processing;
- If existing data are to be used, a description of their origins;
- A description of the standards to be used for data and metadata format and content;
- Plans and justifications for archiving the data;
- Appropriate timeframe for preservation; and
- If for legitimate reasons the data cannot be preserved, the plan will include a justification citing such reasons.
However, the second program, the DoD’s Congressionally Directed Medical Research Program’s Peer Reviewed Medical Research (PRMRP) awards, administered by the U.S. Army, was a mystery, a perfect stranger showing up in Cabot Cove. “I didn’t know the first thing about this CDMRP!” It was a case that needed to be solved.
Type of plan
The PRMRP proposal guidelines were quite clear that a Data and Research Resources Sharing Plan is required be included in the proposal, within a file attachment along with other types of supporting documentation. Each document has to be on a new page and combined and uploaded as a single file labeled “Support.pdf.”
This is where the plot begins to thicken. People kept mentioning “1-page” as they were talking about the plan. Where were they getting this number? After a quick call, it turns out that as late as 2014, CDMRP proposals were requiring a one-page data and research resources plan, within a separate file attachment.
(See page 19 for an example: http://cdmrp.army.mil/funding/pa/14prmrpfpa_pa.pdf)
However, I wouldn’t let these details throw me off the scent of the trail!
After gathering all the guests together in the great hall, I showed them that the 2015 PRMRP guidelines clearly state that there are no page limits for any of these components, unless otherwise noted. AND the data and research resources plan is now combined with other supporting documents and uploaded as a single file labeled “Support.pdf.” So, in fact there is no page limit noted for data and research resources sharing plans in this PRMRP proposal! (Lightning flashes outside)
CDMRP Criteria for describing how data and resources generated during the performance of the project will be shared with the research community
As all the guests returned to their rooms, I still had this feeling that something just wasn’t right. The proposal guidelines had told me to refer to the General Application Instructions, Appendix 3, Section L for more information on the actual criteria for what to include in the plan. Yet, what I thought were the expectations I had found in Appendix 4, not Appendix 3. Then it hit me: the General Application appendix I had been given was labeled 2014! The appendices must have been amended for fiscal year 2015! (Lights go out; lightning flashes, bangs of thunder)
(See 2015 General Application Appendix 3, Section L here: https://ebrap.org/eBRAP/public/ViewFileTemplate.htm?fileTemplateId=1190500&fileType=pdf)
The Congressionally Directed Medical Research Programs: Policy on Sharing Data and Research Resources
Why had I found so many references in the Appendix to CDMRP’s expectations? Why did section L seems so truncated, as if it had been language lifted from a larger text?
Of course! It had been lifted from a larger text! The CDMRP oversees several funding programs, the PRMRP just one among these. There must be an overarching policy that these other programs are quoting from!
(See the policy here: https://cdmrp.org/files/forms/generic/policyOnDataResourceSharing.pdf)
And there it was, The Congressionally Directed Medical Research Programs: Policy on Sharing Data and Research Resources, containing all the criteria for what data must be shared, the recommended methods for sharing of the data, and the methods for sharing of unique research resources.
The mystery was solved!
- Vanderplas, J. (2014)
I have been actively participating in conversations about the roles librarians take on in data-intensive settings for a few years now. Typically (not always) the focus of those conversations settles broadly on librarians’ need to become more hands-on with the tools being used by their patrons and to add to their data science toolkits. I recently came across an article  written last year by Jake Vanderplas that discusses this same idea as applied to researchers in some depth, and references the above featured image of the “Pi-shaped” researcher. The pi-shaped researcher is contrasted with the t-shaped researcher as a metaphor for the skills that researchers need in order to fully take advantage of the research methods available to them in their respective fields. While Vanderplas points out that this pi-shaped vs. t-shaped description may not be quite right, he states that, “Regardless of what metaphor, definition, or label you apply to this class of researcher, it is clear that their [data science] skill set is highly valuable in both academia and industry…” and I could not agree more. However, while I very much advocate for (and try to initiate!) that kind of dialog within the library community, I am shifting the focus of this post toward the other leg of the pi: librarians need to be knowledgeable about their subject domains in order to meet the specific needs of their communities and to be able to participate in meaningful conversations with people outside their libraries.
I am making this point today because although I have been developing my technical and mathematical skills continuously, in a few weeks I will be starting at a new position that will require me to be increasingly familiar with a new subject domain; I have accepted a position as the new Assistant Head Librarian of the Harvard-Smithsonian Center for Astrophysics in Cambridge, MA and up to this point, I have considered myself to be no more than an astronomy/astrophysics hobbyist. I am not a complete stranger to physics or astronomy, but I acknowledge that the depth of my knowledge is in need of further development, and in situations like this it can be difficult to know where to start. I think this aspect of data librarianship is often over-looked or generalizations are made across disciplines to the point where it’s challenging to relate to advice that seems to ignore the intricacies of your specific domain. For example, in my most recent position, I worked in the biomedical sciences and had trouble applying data management education strategies that were commonly advocated by the library community because they seemed to focus so heavily on NSF recommendations, which differ greatly from the NIH’s, and ignored much of the complexity that comes with working with HIPAA data. In response, I had to exponentially increase my understanding of the work done by biomedical researchers and it took quite a while to really understand how to navigate that situation. I now find myself entering into another new domain, and hope to speed up that learning curve as much as possible. The remainder of this post is intended to document (in no particular order) some of the advice I have received and practices I have been employing to get adequately familiar with new data-intensive research domains.
Talk to people in the field
You could call this networking, but I think that description is a bit reductive; I’m not talking about handing out business cards at a conference. I have found that just having a conversation can really help you begin building relationships with people who work in the domain you’re entering. And I’m not just talking about librarians in that field! Go talk to some researchers as soon as possible. Just get coffee together or ask if you can visit her/his lab. Try to get face-to-face and ask about the work that their team does and some of the obstacles they encounter. Try attending an event or seminar where you know researchers in that field will be (e.g. a hack-a-thon). Ask what journals you should be paying attention to and who to follow on Twitter; you might be thinking right now, “I don’t really use Twitter” and that’s fine, but I have personally found the platform to be a great place to just follow what’s going on and to ask questions. You may even find a mentor this way, which is always helpful especially when you’re just getting started.
Do some research / learn the jargon
Start with the basics if you need to. The e-Science Portal itself is one place where you can find some useful guides on different scientific disciplines and even attend a “Science Bootcamp” to help get you started. There are also wonderful platforms like Khan Academy if you’re like me and you learn better by watching videos and following along. If you start to talk to researchers you’ll also start hearing about things you’ve probably never thought about before, which will give you a pretty good list of topics to read up on. Be sure you have a solid understanding of the most important concepts in the field and the issues discussed at the field’s biggest conferences. I think this is true even if you start with a pretty strong background in a related field or a working knowledge of the databases and research methods employed in the domain; reading some articles that go beyond what you know (my opinion) is essential to being able to empathize with your patrons and understand the complexity of their work.
Look at some data!
It’s important to not make assumptions about the data your patrons work with. It’s not wise to assume everything they work with is going to be tabular or sequential or flat. Without really looking at a few datasets you cannot begin addressing data literacy in that field. “Best Practices” for working with data may be generalizable to your new subject, but they may not be. You can’t know if you don’t look. By talking to researchers and learning more about the field you will get a better sense of the types of data these researchers work with but you should also get a sense of the data’s structure and the issues that surround how it’s shared and some common tools used by the people who are handling the data most closely. For example, in my previous position R and SAS were pretty commonly the go-to tools for analysis and data manipulation and CSV was the most common format I came across. Now though I’m seeing that Python is more the current standard in my new domain and the file formats tend to be much less straightforward.
If you ever find yourself shifting your career to focus on a new domain, keep these things in mind. There’s always going to be a lot to learn, but that’s the nature of librarianship. Acknowledging that you have a lot to learn though is a good first step.
 Vanderplas, J. (2014, August 22). Hacking Academia: Data Science and the University. Retrieved September 21, 2015, from https://jakevdp.github.io/blog/2014/08/22/hacking-academia/
Posted by: Amanda Whitmire, Assistant Professor and Data Management Specialist at Oregon State University Libraries
There has been a lot of conversation lately regarding the use of institutional repositories (IR) for preserving and sharing research datasets. More specifically, #datalibs have been abuzz about the perception among some publishers that an IR would be an acceptable location for hosting datasets only if it can mint a digital object identifier (DOI) for said dataset. But, why the exclusive emphasis on DOIs?
At it’s simplest, a digital object identifier (DOI) is “a unique alphanumeric string assigned … to identify content and provide a persistent link to its location on the Internet” (APA Style). Among other things, assignment of a DOI to a thing is intended to:
- uniquely identify that thing (disambiguate it from other things), and
- provide a mechanism to enable persistent access to the thing (to both find it and get it).
That said, the purpose of many digital identifiers is exactly the same. CrossRef spearheaded the use of DOIs for identifying scholarly works as a means to ensure persistent citation and location (via persistent links) of journal articles . As such, DOIs have become synonymous with peer-reviewed publications , and “something like an implicit seal of approval from the Great Sky Guild of Academic Publishing” . In plain terms, a DOI is increasingly seen as imparting some aspect of legitimacy upon that which it has been assigned. That’s a problem. As CrossRef tells it:“CrossRef’s dominance as the primary DOI registration agency makes it easy to assume CrossRef’s *particular* application of the DOI as a scholarly citation identifier is somehow intrinsic to the DOI. The truth is, the DOI has nothing specifically to do with citation or scholarly publishing. It is simply an identifier that can be used for virtually any application.” 
In other words, there’s nothing magical about DOIs when it comes to identifying or locating scholarly works. In the ecosystem of digital identifiers, a DOI is one of many good options. A more nuanced exploration of DOIs and other identifier schemes tells the same story . There is no greater technical benefit to using a DOI vs. some other digital identifier (a persistent uniform resource locator, or PURL, for example). The only real “advantage” of the DOI is that they are increasingly viewed as the only “acceptable” identifier by publishers, and therein lies the problem.
This narrowly held perspective regarding what constitutes an “acceptable” identifier has pervaded so deeply into the minds and habits of publishers that an IR that does not assign DOIs to its datasets may not be deemed an acceptable place for a researcher to deposit data in support of a publication. For example, the data policy at Scientific Data says that, “We are glad to support the use of institutional or project-specific repositories, if they are able to mint DataCite DOIs for hosted data” (emphasis mine). Scientific Data is not alone in this. Earth System Science Data also requires that submitted datasets be deposited in a repository that assigns DOIs .
Why does this matter?
Researchers are now presented with an ever-expanding selection of repositories where they can deposit their data to facilitate sharing. IRs evidence a commitment to persistence and longevity that is lacking in newer infrastructure. The idea that an IR would be deemed an unsuitable archive based solely on the fact that it does not assign a DOI is both absurd and counterproductive. The requirement of a DOI over other repository features (the existence of a preservation policy, for example) serves only to reduce the number of well-supported data preservation options for researchers.
So, what’s the take-away?
Data specialists are working at the forefront of rapid cultural and technological changes in how research is being conducted and shared. The development of useful, broadly applicable best practices for data preservation and sharing relies heavily upon collaboration, with thoughtful contributions from diverse groups working toward a shared goal (Force11 is a terrific example of this). Where standard practices don’t yet exist, researchers, publishers and other stakeholders in the scholarly community are making things up as they go (and very thoughtfully so, but still pretty much winging it). Academic libraries have a lot to offer in this space, and the suitability of using IRs for preserving datasets is an issue that we should not approach with timidity. Reach out and make connections with journals. Question their data policies and offer alternatives. Be bold, #datalibs! Your voice and your involvement are critical.—  http://crosstech.crossref.org/2013/09/dois-unambiguously-and-persistently-identify-published-trustworthy-citable-online-scholarly-literature-right.html
The Tisch Library at Tufts University in Medford/Somerville, Massachusetts is seeking a Librarian for Research Data. Please see the posting for a complete description of the position: http://tufts.taleo.net/careersection/ext/jobdetail.ftl?job=15001602&lang=en.
Contributed by Donna Kafel, Project Coordinator for the New England e-Science Program, Donna.Kafel@umassmed.edu
Andrew Johnson is the Research Data Librarian at the University of Colorado, Boulder, and PI for DataQ, “a collaborative platform for answering research data questions in academic libraries,” that was recently launched in August. DataQ is a unique resource in that it provides a platform where librarians can submit research data management and curation questions which in turn are fielded by the Editorial Team and answered by a DataQ Editor. DataQ is meant to be interactive–community members who have created a DataQ log-in account, are welcome to add to the answers or post comments.
DataQ is funded by an IMLS Sparks! Ignition Grant for Libraries and co-sponsored by University of Colorado Boulder, the Greater Western Library Alliance, and the Great Plains Network.
I spoke with Andrew over the phone recently to learn more about the DataQ project that he and co-PI Megan Bresnahan have led since they were awarded the IMLS Sparks! Ignition grant in November 2014. Much of our discussion revolved around project management aspects of the DataQ grant. Here is an outline of our conversation:
Donna: How did you come up with the idea for DataQ?
Andrew: Megan actually came up with the idea while we were attending RDAP in Baltimore a few years ago. We were trying to think of ways that we could extend local support for librarians engaging in RDM services to the wider community, so she thought that a service like DataQ could be one way to do that. Prior to DataQ, I’d been active in the DataFOUR project (http://imls.gwla.org/), which was sponsored by the GWLA and GPN. The idea for the DataQ grant snowballed from DataFOUR and its aim to provide regional support for developing RDM services. Megan and I applied for the IMLS funding for DataQ with the support of GWLA and GPN, and of course our library administration at UC Boulder. In September we were awarded an IMLS Sparks! one year grant, from Nov. 1, 2014-October 31, 2015, to develop DataQ.
Donna: Can you explain the GWLA and GPN groups? Are they consortia?
Andrew: GWLA is a consortium of research libraries in the central and western United States. GPN was founded by researchers and is a consortium of Midwestern universities focused on cyberinfrastructure initiatives. The two groups collaborate on different projects, and host their annual meetings in conjunction.
Donna: I’m really impressed that in the course of a one year planning grant, you’ve pulled together such a large working group of Editors and launched DataQ –all well within the 12 month timeframe. Can you describe a bit about the project timeline and your working model?
Andrew: Yes, there were a lot of pieces to put together to make DataQ happen. We had a $25,000 budget to work with and a relatively short time to get the project up and running. In the first months we contracted with Drupal developers to create the site. In December we put out a Call for Editors. We were surprised by the overwhelming response to the Call. We had budgeted for eight Editors. It was really hard to limit ourselves to eight when so many highly qualified librarians with experience and expertise in data services responded that they were interested in participating in DataQ. Ultimately we were lucky in that we were able to expand the number of Editors from what we had budgeted as a few Editors received support from their institutions to attend our orientation meeting. We were also very lucky to have a separate group of librarians and other information professionals eager to participate in the project. Many of them accepted our invitation to be virtual project volunteers. They helped the project tremendously. In June, DataQ wasn’t ready for prime time, but we wanted to do a pre-launch of it by putting up a sample web form to collect questions from anonymous users. The pre-launch was a way for us to collect questions and populate DataQ with these initial questions prior to the actual launch. The DataQ volunteers helped us to gather many of these questions as well.
Donna: With the Editors being from all different geographic areas, how did you orient them to the project and develop a system for their workflows?
Andrew: We had an in-person training meeting in June that all the Editors attended that was held alongside the GWLA/GPN meeting. The meeting was very productive with all the Editors fully engaged in discussions as we planned the logistics for developing and implementing DataQ. We were able to develop Editorial workflows, establish a system for communication, brainstorm new ideas that went beyond what Megan and I had initially envisioned, and plan the project in the course of the short time we met.
Donna: What is the internal process that takes place when someone submits a question to the site?
Andrew: We have a listserv that includes all the Editors and the PIs. When a question is submitted, it gets sent to the listserv. Editors can then review the question. Any of them can opt to answer it on a first come first serve basis. The Editor who first responds composes an answer on an internal Google doc. We then have two Editors review the answer. Once an answer is approved, the Editor who authored the response posts it on the site.
Donna: Regarding the users who submit questions, are they anonymous?
Andrew: They can opt to be. We offer three options: users can choose to be anonymous, they can send along their e-mail in case the Editors need to get further information from them to answer their questions (and to let them know when an answer has been posted), or users can opt to sign in to get a DataQ user account. Accounts enable users to post comments on the DataQ site. DataQ is intended to be an interactive site. We hope that users will create user accounts and contribute their ideas and comments.
Donna: Are you seeing trends in the types of questions that users are submitting to DataQ?
Andrew: Yes, we’re seeing quite a few questions related to data citation, data documentation, and data sharing.
Donna: What is the sustainability plan for DataQ?
Andrew: That’s what we’re working on now, planning on how the project will move forward after the funding period. We may be applying for further funding to continue the project.
Donna: Will you be presenting DataQ at any national or regional conferences?
Andrew: We’ve been asked to present a few webinars on it which we’re really glad to do. Also a couple of the Editors will be presenting a poster on it at DLF. We hope to also present it at some other conferences in the coming months.
Donna: DataQ has filled a niche—providing expert answers to librarians’ specific RDM questions. Congratulations to you, Megan, and the entire team in getting DataQ up and running—in an amazingly short time!
Submitted by Donna Kafel, Project Coordinator for the New England e-Science Program, firstname.lastname@example.org
The inaugural New England Research Data Management Roundtable was held last Tuesday, August 18th at the Du Bois Library at the University of Massachusetts Amherst campus. This roundtable is the first in a planned series of roundtable discussions targeted for New England librarians who are engaged in research data management services or who want to learn more about data librarianship. Sponsored by the National Network of Libraries of Medicine, New England Region, the NE RDM Roundtables will provide opportunities for New England librarians to compare notes, ask questions, share lessons learned, explore new working models, acquire fresh ideas for their workplaces and develop new partnerships.
This particular Roundtable event was specifically intended for librarians in the RDM Community of Practice, i.e. librarians who are currently actively engaged in planning and/or delivering RDM services. (Note: future NE Roundtables will also be planned for an RDM Community of Interest). It was also preceded by a tour of the Massachusetts Green High Performance Computing Center in Holyoke, MA. Twenty four librarians from multiple institutions, including four of the five University of Massachusetts campuses, University of Connecticut, Boston University, Boston College, Harvard, MIT, University of New Hampshire, Brandeis, Northeastern, Mt. Holyoke, and Drexel University discussed the topic “Organizational structures for research data management services at our institutions.” Attendees were divided into five tables with four to five other attendees per table. At each table a member of the NE Roundtable planning team served as moderator for the discussion. The program was divided into two 45 minute sessions. During the first session, the discussion topic was structures within the library for delivering RDM services. The second session focused on partners on campus that support RDM services. The discussions revolved around specific questions. Time was given between the two sessions and at the end of the second session for each roundtable to report out.
Feedback on the Roundtable event has been quite positive. Attendees have noted that they like the opportunity to hear what their colleagues are doing and to discuss RDM issues, challenges, strengths, and their libraries’ service models. The New England e-Science Program plans to coordinate future Roundtables three times a year. Topics for these roundtables will be based on attendee recommendations.
The following is a summary of questions and bulleted attendee responses and comments from the Roundtable Discussion tables. For Topic 1 questions 3-6 and Topic 2 questions 1-5, the bulleted responses are grouped by theme.
Topic 1: Library Structures for Delivering RDM Services
1. What is the current structure for data management services at your library? What staff is involved and what are their relationships to each other and the work?
- Library director appointed a non-librarian project manager to be DM liaison between Office of Research and the Library. The library is not providing DM services but is incorporating “digital measures”—digitizing faculty CVs for all time (historical)
- Small undergraduate science library just starting out in RDM, no organizational structure yet. Science librarian has been assigned the DM role recently and is learning. He gave presentation about RDM with 2 other librarians to faculty.
- Engineering and Data Services librarian started a year ago. He oversees all aspects of data services in library. Other librarians are involved –science librarian as liaison to science faculty and NE e-Science program, metadata librarian for help with metadata and ontologies, and Systems dept for software support (such as DMP Tool).
- An eScience team made up of three librarians from the Science Library led by one of these three.
- Working group made up of librarians from different disciplines and systems librarian. Most librarians involved are science/engineering, and IT is involved.
- Large research university library has had a DM task force for “way too long.” This includes librarian representatives from special collections, science library, social sciences, library systems, scholarly communications that is coordinated by the director of the science library. Having a DH librarian on team has helped the team not to focus exclusively on STEM fields.
- Private academic health sciences library has DM working group with reps from the library/archives/research labs, postdocs, IT. The group meets 6x/year. The working group would like to hire a data expert to focus on archiving large longitudinal study
- Has Library Data Services Advisory Group, which started 1 ½ years ago. The group is made up of scholarly communications librarian, IR librarian, Associate Library Director, two outreach librarians, Head of Office of Sponsored Research, and representative from Research Computing.
2. How did this service begin and how has it changed over time?
- Service did not begin at small public university until data services librarian started. It has changed with tweaks to the library’s Data Workshop series for faculty, PhD students and some staff, has RDM libguide based on NSF requirements, slightly customized DMP Tool
- Started in 2012 with E-Science Institute, an RDM services working group began last year (includes ~ 12 people—IR librarian, desktop services, 3 dept liaisons (science, soc science, and gov docs), and academic technology, analysis expert
- Started somewhat informally several years ago by three science & engineering librarians who co-created an RDM libguide. Always been more of a collegial staff than a hierarchical one. Some team members are specifically part of the Data and Specialized Services Dept.
- Got started by teaching workshops a while ago, were more successful with grad students than faculty.
- Started since the ARL “Future of Science Librarianship” conference, the library formed a team of subject specialists, scholarly comm. librarian sits in.
3. What strengths does your library have related to data management and how did you fill them?
- Has a dedicated RDM fellow
- Has a dedicated RDM librarian
- University is small enough so that small library team can manage all requests, enough background among library staff to serve most of population. Research population tilts more toward the natural/physical sciences so there are fewer disciplines to keep abreast of.
- Library has expertise in metadata services, building collections, describing information, enabling access. Staff is dedicated to helping faculty/students/staff. The library has a vision to create RDM jobs among the library staff.
- Some capacity for more in-depth consultations
- Focused team approach, specialized knowledge plus shared responsibility
- Library invested in infrastructure to support researchers—e.g. repository, research computing
- Has a Data & Specialized Services department
- Lots of varied expertise in large research university
- Strong IR
- A merged department with IT is very useful as IT people have good ideas about implementing DMPs
- New library administrator has strong RDM background and is committed to growing library RDM services
Perception of Library
- Library has established a good reputation through IR
- Library has existing working relationships with campus constituencies
- Good working relationship with Office of Research
- Developed short “quick bites” RDM introductions instead of long workshops
- Broader committee brings in stakeholders across campus (IT, sponsored programs)
- Getting the word out to the community, having services that resonate with users, built relationships with researchers
- New library administrator has strong RDM background and is committed to growing library RDM services
4. What weaknesses does your library have related to data management and how do you address them?
- Not sure if there are needed RDM services that the library is not aware of
- Struggling with a campus wide lack of cohesive outlook on RDM that makes for confusion
- Defining data management—it means different things to different people
- Haven’t been able to get researchers and students to enroll in library’s RDM courses
- No courage to stop doing what we’ve been doing for 20 yrs—e.g. reference shifts, low level
- Not clear how to avail expertise from the librarians who are outside of the data services team
- Short staffing limits what library can do
- Many liaisons are more focused on collections, don’t see relevance of RDM services or are fearful of change
- No central focused person to head library’s RDM team
- Members of RDM working group can’t dedicate time to work with liaisons
- Lack of RDM policies (common among many institutions)
- Lack an institutional repository or a holding center for data in progress
- Lack of funding
- Trying to initiate new library services as a lower level staff person—need support of library administrators and their involvement in securing campus buy-in
- Difficult to bring researchers together on a Balkanized campus
- Isolation from researcher community that library serves
5. What are your main program elements for data management services in the library and how do you conduct them?
RDM Working Groups (see descriptions of working groups in question 1)
- Developed LibGuide (noted by multiple individuals)
- Data Management Workshop series-an overview of RDM theory and applications—hour long sessions that are held 1-2x/week; LibGuide, DPM tool, Consultation Services by appt.
- DMP Consultations
- RDM team is made up of several librarians who consult on DMPs
- Library offers consultations, workshops, conducts training during Responsible Conduct of Research sessions
- Archiving older data sets, got a CLIR grant for collection of data, archiving a large longitudinal study of child health and clinical data, trying to hire a data person to focus on this.
- Work with Office of Sponsored Research to find out new grants and reach out to PIs
6. What would you like to be doing (as a library and as an individual) related to data management that you are not doing now?
- Have a seat at the table—a place in the formal campus structure where decisions on infrastructure and services are made
- Collaborate with Digital Humanities
- Have contact with research team throughout grant and project cycle
- Create a data IR (one library noted goal to use Dataverse for its data IR)
- Create an infrastructure similar to Purdue where library is the portal and telling the story and IT provides the infrastructure and the Library works closely with Office of Research on compliance
- Track where data is going
- Would like to get more liaison librarians involved with data management
- Have a dedicated librarian who is a focal point for RDM
- Have RDM training incorporated into 1st year grad student requirements
- Conduct RDM training in conjunction with Responsible Conduct of Research training
- Get an RDM course into the curriculum
Topic 2: Collaborations on Campus
1. Who are your current partners on campus?
Uncertainty and problems
- In early stages, trying to learn as much as possible
- Not sure where this is going
- We’re trying to figure out what to do next
- Very do-it-yourself and there are pockets everywhere
- How do services connect when there is no commitment to collaborate?
- Until there is a policy behind it, they will not fund/go further
- Recognize a need for campus-wide “thing” but getting it moving – what do researchers what?
- Always library initiated
- When personnel change, connections change; developing relationships takes time
- Keep liaisons in the loop when working with faculty
- Sometimes faculty don’t come to the library or know of services, many are doing it on their own
- We’re making headway
- With IT and sponsored research, it can be one-sided and difficult at times
- Some campus admins are on board, some are not
Planning and ideas
- Library is the one thinking about this, talking with potential partners
- Services being offered: consultations, education, websites
- Library has back-channel communication with IT staff
- Repository available for some but not all institutions
- Going to create a team
- Putting together meeting of stakeholders on campus
- Partnerships are in their infancy. We want to reach the full community
- Survey on data needs
- We are doing the DMP Tool
- Considering a campus data summit
- Connections from open access policy are useful for data management policy/ practice discussion
- Ongoing discussion about campus infrastructure
- Finding ways to get to the faculty
- Relationships can lead to partnerships
- Partnerships stem out of just talking to other people
- Helps to have culture of open doors, availability to at least discuss
- Academic computing relationship is informal to semi-formal
- Referrals from Office of Research, this is a collaboration “waiting to happen”
- Research computing
- Office of research site–links to library data services
- Policies for data ownership and management
- Working on DMP Tool single sign-on
- co-host meetings for faculty
- host ELN jointly
- workshop participation, such as on data security & active storage topics
- Post-Doc office
- Office of sponsored programs
- info for libguides
- funding policies
- info about grants currently funded on campus
- access to dmps already written
- instruction for DMP Tool
- Labs & offices
- small scale instruction
- data to ingest into repository
- workshop on how to write proposal, including DMP training
- training in specific areas
- Grad student office
- instruction and orientation, for example on cloud storage
- eScience institute
- building training modules
- Scholarly communications office
- open access
- public access policies
- Office of general council
2. On what programs do you collaborate with campus partners?
- Customizing the DMP Tool
- DMP consultations
- Co-presentations with sponsored programs
- Co-presentations with scholarly publications
- Outreach – visit seminars and institutes
- Tech fair – library repository
- Three services: webpage, consultations, education.
- Three data services: consultations for DMP’s or general data management topics; education and training; and data archiving either in data repositories or in our institutional data repository
- Have a data management libguide
- Data Services webpage
- Webpage for services – spells out what we mean by RDM; points people to the different contacts on campus for data lifecycles; everything in one place
3. Who would you like to collaborate with?
- Building collaborations with faculty
- Faculty are interested in library supporting them and being involved; some elements are there
- We’d love to hear more from OSP; there are often time constraints
- Sponsored projects workshops
Office of Research
- “Research day” – compliance
- Workshops / outreach
- DM workshop series
- Copyright classes to graduate students
- Data management for active research, for example ELN
- It would be great to have a university level strategy
- Get a partnership with preservation
- A data board that could help with developing services
- Webpage to point people to certain areas
- Stakeholders – would like broader outreach, a unified group across campus
- New faculty institute – IRB, funding/grants/DMP’s
4. What are the roadblocks?
Perception of Library
- Being seen as being credible and useful. Libraries are seen as having a certain skill set. Need to have conversations and advocates in higher places – i.e. Provost. There’s a disconnect sometimes in terms of what people know of services in library.
- Perception of library as rare books room
- People do their own thing and don’t depend on library
- Research data policy/ lack thereof
- No policies & policies that do exist people do not know about
- Pass a policy but can it be implemented, is it realistic in what the library/institution can do?
- No buy-in due to “high up” (policy driver)
- IT layoffs
- Turf wars, territoriality (we can do this ourselves)
- Staffing stability
- People saying something will happen by a date and it not
- Turnover of staff / loss of staff positions can put a hold on things
- Personalities can be a problem
- Campus IT can be hard to communicate with
- Limited capacity for new services
- Lack of consistency of funding
- Lack of structure/organization
- Other departments have other agendas, similar issues but different priorities. Timing can be an issue to work with people on different schedules
- Getting PI’s on board, they all do things differently
- PI’s may train lab really well in DMP, others do not
- Different needs for different researchers
- Decentralized means different parts don’t always communicate
- What does language mean? Ex archiving, DM services. Have to define terms and how you are using them, controlled vocabulary
5. What support is needed from the library or the institution?
- High level support / promotion
- Infrastructure – e.g. ELN, repository
- People network
- What are other stakeholder desires & interests? Know enough about campus to make solutions
- Institutional view of issues
- Quality metadata requirements need repository librarian –
- Need more support for archiving & storage
- Problem is librarians want to take on tasks / have to take on tasks but cannot give something up.
- Communicating with peers – learning what else is happening
- What about Social Sciences & Humanities?
- Library management has been helpful
- Professional development from the library
- Help from the library to make connections
6. What external support is needed?
- Professional development to broaden knowledge (like this!)
- Listserv of this group
- STS listserv is helpful but don’t brand as discipline specific
- How to host NE region listserv?
- Sharing experiences & training with other librarians
- Short videos on technical subjects, like bit rot, preservation of videos, subject repository vs. local storage
Submitted by guest contributor Amanda Rust, Digital Humanities Librarian, Assistant Director, Digital Scholarship Group, Northeastern University Libraries, email@example.com
About six months ago I began a new position as a Digital Humanities Librarian, and I am now lucky enough to work with humanities data from the nitty-gritty (helping researchers contact publishers to acquire historical newspaper data) to the broadly conceptual (how is historical cultural data made?)
So in composing this short post, I thought I’d start with some recent big-picture discussions, and then apply some of those concepts to cultural data, which is often library-produced data.
Lisa Gitelman’s edited volume Raw Data is an Oxymoron and Johanna Drucker’s reformulation of data and capta (first in this 2011 Digital Humanities Quarterly article) are two excellent places to start, and well-known in the digital humanities field. To give a likely overly broad summary: these works suggest that the very meaning of “data” has changed over time, and even what we think of as the most natural, obvious, “given” data is designed in some way. The experimenter chose to observe it, created instruments encoding choices on how to measure it, perhaps disregarded outliers, imposed categorization and storage once it was captured, and so on.
Not to say that other disciplines have never considered these ideas! On the Humanist email list – one of the oldest online spaces for digital humanities work – there was recently a thread where long-time moderator Willard McCarthy prompted discussion of a resonant quote from Barry Lopez’ 1986 Arctic Dreams. In that book Lopez, a field biologist, discusses his work in the Arctic and “wonders” at the process of naming, the process of reducing what takes place “out there” to patterns that are statistically important, concluding that for the species under study: “No matter how long you watch, you will not see all it can do.”
So how do these theoretical considerations come into play when working with digital humanities projects? We are always confronting what’s been left out of the data. Researchers may start with open access data because it’s there, not because it’s the most relevant, immediately prompting us to notice that some core historical collections are only available via subscription. Why were some resources scanned and made open access, and others not? The vagaries of grant funding? The gaps between wealthy institutions that can afford to scan their collections, and those that cannot? The pressure on institutions to see special collections as a revenue stream?
Beyond the question of open or paid access, researchers are now asking detailed questions on libraries’ selection processes behind both preservation and digitization. Is the data representative of what was culturally significant in the past? What the library later determined to be significant? Who defines “significant”? Or was the original selection based on what was in good condition, or with clear copyright, or had multiple copies, or lacked multiple copies, or had a thematic focus that was easily grant fund-able? Libraries are often the producers of humanities data – or, capta — so it is both thrilling and frightening when digital humanities scholars ask these uncomfortable questions.