One of the things I’m working on currently is developing a semester long data information literacy course with some of my colleagues for graduate students in the College of Engineering here at the University of Michigan. In constructing this course, I have been thinking about how we could incorporate ACRL’s “Framework for Information Literacy for Higher Education” (hereafter referred to as “the Framework”), particularly the idea of threshold concepts. The framework represents an effort to move beyond a prescriptive skills-based course of instruction as represented by ACRL’s 2000 “Information Literacy Competency Standards for Higher Education” and towards a less directive, student centered model of education that promotes engagement with other fields. Threshold concepts, as defined within the framework, are “those ideas in any discipline that are passageways or portals to enlarged understanding or ways of thinking and practicing within that discipline”.
The Framework is comprised of six threshold concepts that serve as its core, although the framework is not intended to be prescriptive or exhaustive. Librarians are encouraged to apply the Framework in ways that are relevant to their environment and educational objectives. It is with this encouragement in mind that I post some initial thoughts in how the Framework might inform our thinking in developing our DIL course.
- From the Framework: Authority Is Constructed and Contextual
- As applied to data: Your Data Are a Component of Your Professional Identity.
As a graduate student, you are developing your professional identity and becoming an authority in your field. The data you are generating or using in your research serves in part as an indicator of your expertise and credibility as a scholar. As such, it is important that you document and describe your data in ways that enable others to understand, evaluate and trust your work in order to build your authority in your field of research.
- From the Framework: Information Creation as a Process
- As applied to data: Data Creation as Process
The processes of acquiring, preparing, analyzing and summarizing data that are done as a part of the research process affect the utility, accessibility and potential impact of the data. For example, digital data are often migrated from one format into another to enable the data to be interpreted by a particular software package. These processes are components of a larger data lifecycle which include considerations prior to acquiring data (such as planning and discovery) as well as after summarizing the data in reporting research findings (such as dissemination and preservation of the data itself). Defining the lifecycle of your data and seeing how the stages in your lifecycle are connected is critical for understanding how actions taken in one stage may affect another, advancing or restricting what you or others are able to do with the data.
- From the Framework: Information Has Value.
- As applied to data: Data Have Value Outside of the Purpose for Which They Were Generated
Research data are generally created with the intent of addressing a particular question or understanding a particular situation. However, data can also be repurposed and used by others outside of the original researcher or team to ask new questions or support new areas of inquiry. In addition, beyond supporting new research endeavors, data can also be a commodity, a tool for education, a means to persuade opinion or a means of better understanding our world.
- From the Framework: Research as Inquiry
- For Data: Data Have Long-Term Value
Data are often defined as the building blocks of research. Research is an iterative process, where people return to previous findings to question them based on new knowledge or to pose increasing more specific or complex inquiries. As researchers return to past areas of inquiry, the data that underlie these inquiries needs to have been preserved in ways that enable others to revisit and reuse them. In developing your data set, consider the ways in which you can preserve your data to support its continued use in fueling new areas of exploration.
- From the Framework: Scholarship as Conversation
- For Data: Data as a Component of the Scholarly Communications Process
Scholarship can be understood as a discourse amongst communities of scholars, researchers, or professionals engage in which they seek to communicate their insights and perspectives to designated audiences. Data often underlie the findings, arguments or points being made in this discourse. Without access to the data it may be difficult to fully understand or trust the findings or arguments being made. In considering how to share your research, give your research data equal consideration and treatment on par with articles or other more traditional product s of scholarly communication.
- From the Framework: Searching as Strategic Exploration
- For Data: Developing the Human Readable Elements of your Data Set
Data are designed to be consumed by machines (instruments, software, etc.) rather than by humans as articles or most other publications are. However, as the consumption of data becomes increasingly important to researchers (and educators, government agencies, businesses, the general public, etc.), data producers need to consider how their data could be discovered, understood and trusted by others through providing documentation or description written clear and direct language.
This is only a preliminary exploration of how threshold concepts could potentially be used to inform teaching data information literacy. However, I believe that this is a useful area of study and I would love to see more rigorous work done to articulate further the connections between data and ACRL’s Framework for Information Literacy.
By Andrew Creamer, Scientific Data Management Specialist, Brown University Library
Tonight, On DMP, He Wrote…DoD Data Management and Research Resources Sharing Plans
Sheriff Mort Metzger: Mrs. Flecther! I said, do me a favor, please, and tell me what goes on in this town!
Jessica Fletcher: I’m sorry, but…
Sheriff Mort Metzger: I’ve been here one year, this is my fifth murder. What is this, the death capitol of Maine? On a per capita basis this place makes the south Bronx look like Sunny Brooke farms!
Jessica Fletcher: But I assure you Sheriff…
Sheriff Mort Metzger: I mean, is that why [former Sheriff] Tupper quit? He couldn’t take it anymore? Somebody really should’ve warned me, Mrs. Fletcher. Now, perfect strangers are coming to Cabot Cove to die! I mean look at this guy! You don’t know him, I don’t know him. He has no ID, we don’t know the first thing about this guy! (“Murder, She Wrote: Mirror, Mirror, on the Wall: Part 1 (#5.21)” (1989))
Like most of my colleagues in U.S. University Libraries’ research data services, I see a lot of business in helping our faculty with NSF data management plans and NIH data sharing plans. I see each new NSF/NIH proposal as an opportunity to learn new things and to help our faculty to improve and strengthen their plans. Within the NSF alone, there can be a lot of variation: a new directorate, division or office proposal that I have not worked with before or at least not very often. There can also be much variation in the type of award, such as instrumentation, training, dissertation improvement awards, CAREER, etc. That being said, I do enjoy new challenges, and when I am approached by a faculty member or student asking for help with drafting a plan for a funder that I have not had the opportunity to work with before, there is, frankly, a feeling of excitement.
Like Angela Lansbury’s Jessica Fletcher up in Cabot Cove, Maine, I take great joy in sleuthing. I snoop through the funder’s proposal guidelines, partner up with my colleague in the Office of Sponsored Projects to interrogate the program officers and come up with that ‘Eureka moment’: a draft of what’s required or recommended to include in the plan and the parameters we need to meet before I sit down to go over the facts with the faculty member.
Recently, I was asked to help out two different faculty members with their data management and sharing plans from two separate funding programs within The U.S. Department of Defense. I set to work sleuthing. The first proposal, from the Air Force Office of Scientific Research (AFOSR), was actually quite prescriptive in terms of what items to include and how long the document (2-pages) should be:
- The types of data, software, and other materials to be produced in the course of the project, with notation marking those that are publicly releasable;
- How the data will be acquired;
- Time and location of data acquisition if they are scientifically pertinent;
- How the data will be processed;
- The file formats and the naming conventions that will be used;
- A description of the quality assurance and quality control measures during collection, analysis, and processing;
- If existing data are to be used, a description of their origins;
- A description of the standards to be used for data and metadata format and content;
- Plans and justifications for archiving the data;
- Appropriate timeframe for preservation; and
- If for legitimate reasons the data cannot be preserved, the plan will include a justification citing such reasons.
However, the second program, the DoD’s Congressionally Directed Medical Research Program’s Peer Reviewed Medical Research (PRMRP) awards, administered by the U.S. Army, was a mystery, a perfect stranger showing up in Cabot Cove. “I didn’t know the first thing about this CDMRP!” It was a case that needed to be solved.
Type of plan
The PRMRP proposal guidelines were quite clear that a Data and Research Resources Sharing Plan is required be included in the proposal, within a file attachment along with other types of supporting documentation. Each document has to be on a new page and combined and uploaded as a single file labeled “Support.pdf.”
This is where the plot begins to thicken. People kept mentioning “1-page” as they were talking about the plan. Where were they getting this number? After a quick call, it turns out that as late as 2014, CDMRP proposals were requiring a one-page data and research resources plan, within a separate file attachment.
(See page 19 for an example: http://cdmrp.army.mil/funding/pa/14prmrpfpa_pa.pdf)
However, I wouldn’t let these details throw me off the scent of the trail!
After gathering all the guests together in the great hall, I showed them that the 2015 PRMRP guidelines clearly state that there are no page limits for any of these components, unless otherwise noted. AND the data and research resources plan is now combined with other supporting documents and uploaded as a single file labeled “Support.pdf.” So, in fact there is no page limit noted for data and research resources sharing plans in this PRMRP proposal! (Lightning flashes outside)
CDMRP Criteria for describing how data and resources generated during the performance of the project will be shared with the research community
As all the guests returned to their rooms, I still had this feeling that something just wasn’t right. The proposal guidelines had told me to refer to the General Application Instructions, Appendix 3, Section L for more information on the actual criteria for what to include in the plan. Yet, what I thought were the expectations I had found in Appendix 4, not Appendix 3. Then it hit me: the General Application appendix I had been given was labeled 2014! The appendices must have been amended for fiscal year 2015! (Lights go out; lightning flashes, bangs of thunder)
(See 2015 General Application Appendix 3, Section L here: https://ebrap.org/eBRAP/public/ViewFileTemplate.htm?fileTemplateId=1190500&fileType=pdf)
The Congressionally Directed Medical Research Programs: Policy on Sharing Data and Research Resources
Why had I found so many references in the Appendix to CDMRP’s expectations? Why did section L seems so truncated, as if it had been language lifted from a larger text?
Of course! It had been lifted from a larger text! The CDMRP oversees several funding programs, the PRMRP just one among these. There must be an overarching policy that these other programs are quoting from!
(See the policy here: https://cdmrp.org/files/forms/generic/policyOnDataResourceSharing.pdf)
And there it was, The Congressionally Directed Medical Research Programs: Policy on Sharing Data and Research Resources, containing all the criteria for what data must be shared, the recommended methods for sharing of the data, and the methods for sharing of unique research resources.
The mystery was solved!
- Vanderplas, J. (2014)
I have been actively participating in conversations about the roles librarians take on in data-intensive settings for a few years now. Typically (not always) the focus of those conversations settles broadly on librarians’ need to become more hands-on with the tools being used by their patrons and to add to their data science toolkits. I recently came across an article  written last year by Jake Vanderplas that discusses this same idea as applied to researchers in some depth, and references the above featured image of the “Pi-shaped” researcher. The pi-shaped researcher is contrasted with the t-shaped researcher as a metaphor for the skills that researchers need in order to fully take advantage of the research methods available to them in their respective fields. While Vanderplas points out that this pi-shaped vs. t-shaped description may not be quite right, he states that, “Regardless of what metaphor, definition, or label you apply to this class of researcher, it is clear that their [data science] skill set is highly valuable in both academia and industry…” and I could not agree more. However, while I very much advocate for (and try to initiate!) that kind of dialog within the library community, I am shifting the focus of this post toward the other leg of the pi: librarians need to be knowledgeable about their subject domains in order to meet the specific needs of their communities and to be able to participate in meaningful conversations with people outside their libraries.
I am making this point today because although I have been developing my technical and mathematical skills continuously, in a few weeks I will be starting at a new position that will require me to be increasingly familiar with a new subject domain; I have accepted a position as the new Assistant Head Librarian of the Harvard-Smithsonian Center for Astrophysics in Cambridge, MA and up to this point, I have considered myself to be no more than an astronomy/astrophysics hobbyist. I am not a complete stranger to physics or astronomy, but I acknowledge that the depth of my knowledge is in need of further development, and in situations like this it can be difficult to know where to start. I think this aspect of data librarianship is often over-looked or generalizations are made across disciplines to the point where it’s challenging to relate to advice that seems to ignore the intricacies of your specific domain. For example, in my most recent position, I worked in the biomedical sciences and had trouble applying data management education strategies that were commonly advocated by the library community because they seemed to focus so heavily on NSF recommendations, which differ greatly from the NIH’s, and ignored much of the complexity that comes with working with HIPAA data. In response, I had to exponentially increase my understanding of the work done by biomedical researchers and it took quite a while to really understand how to navigate that situation. I now find myself entering into another new domain, and hope to speed up that learning curve as much as possible. The remainder of this post is intended to document (in no particular order) some of the advice I have received and practices I have been employing to get adequately familiar with new data-intensive research domains.
Talk to people in the field
You could call this networking, but I think that description is a bit reductive; I’m not talking about handing out business cards at a conference. I have found that just having a conversation can really help you begin building relationships with people who work in the domain you’re entering. And I’m not just talking about librarians in that field! Go talk to some researchers as soon as possible. Just get coffee together or ask if you can visit her/his lab. Try to get face-to-face and ask about the work that their team does and some of the obstacles they encounter. Try attending an event or seminar where you know researchers in that field will be (e.g. a hack-a-thon). Ask what journals you should be paying attention to and who to follow on Twitter; you might be thinking right now, “I don’t really use Twitter” and that’s fine, but I have personally found the platform to be a great place to just follow what’s going on and to ask questions. You may even find a mentor this way, which is always helpful especially when you’re just getting started.
Do some research / learn the jargon
Start with the basics if you need to. The e-Science Portal itself is one place where you can find some useful guides on different scientific disciplines and even attend a “Science Bootcamp” to help get you started. There are also wonderful platforms like Khan Academy if you’re like me and you learn better by watching videos and following along. If you start to talk to researchers you’ll also start hearing about things you’ve probably never thought about before, which will give you a pretty good list of topics to read up on. Be sure you have a solid understanding of the most important concepts in the field and the issues discussed at the field’s biggest conferences. I think this is true even if you start with a pretty strong background in a related field or a working knowledge of the databases and research methods employed in the domain; reading some articles that go beyond what you know (my opinion) is essential to being able to empathize with your patrons and understand the complexity of their work.
Look at some data!
It’s important to not make assumptions about the data your patrons work with. It’s not wise to assume everything they work with is going to be tabular or sequential or flat. Without really looking at a few datasets you cannot begin addressing data literacy in that field. “Best Practices” for working with data may be generalizable to your new subject, but they may not be. You can’t know if you don’t look. By talking to researchers and learning more about the field you will get a better sense of the types of data these researchers work with but you should also get a sense of the data’s structure and the issues that surround how it’s shared and some common tools used by the people who are handling the data most closely. For example, in my previous position R and SAS were pretty commonly the go-to tools for analysis and data manipulation and CSV was the most common format I came across. Now though I’m seeing that Python is more the current standard in my new domain and the file formats tend to be much less straightforward.
If you ever find yourself shifting your career to focus on a new domain, keep these things in mind. There’s always going to be a lot to learn, but that’s the nature of librarianship. Acknowledging that you have a lot to learn though is a good first step.
 Vanderplas, J. (2014, August 22). Hacking Academia: Data Science and the University. Retrieved September 21, 2015, from https://jakevdp.github.io/blog/2014/08/22/hacking-academia/
Posted by: Amanda Whitmire, Assistant Professor and Data Management Specialist at Oregon State University Libraries
There has been a lot of conversation lately regarding the use of institutional repositories (IR) for preserving and sharing research datasets. More specifically, #datalibs have been abuzz about the perception among some publishers that an IR would be an acceptable location for hosting datasets only if it can mint a digital object identifier (DOI) for said dataset. But, why the exclusive emphasis on DOIs?
At it’s simplest, a digital object identifier (DOI) is “a unique alphanumeric string assigned … to identify content and provide a persistent link to its location on the Internet” (APA Style). Among other things, assignment of a DOI to a thing is intended to:
- uniquely identify that thing (disambiguate it from other things), and
- provide a mechanism to enable persistent access to the thing (to both find it and get it).
That said, the purpose of many digital identifiers is exactly the same. CrossRef spearheaded the use of DOIs for identifying scholarly works as a means to ensure persistent citation and location (via persistent links) of journal articles . As such, DOIs have become synonymous with peer-reviewed publications , and “something like an implicit seal of approval from the Great Sky Guild of Academic Publishing” . In plain terms, a DOI is increasingly seen as imparting some aspect of legitimacy upon that which it has been assigned. That’s a problem. As CrossRef tells it:“CrossRef’s dominance as the primary DOI registration agency makes it easy to assume CrossRef’s *particular* application of the DOI as a scholarly citation identifier is somehow intrinsic to the DOI. The truth is, the DOI has nothing specifically to do with citation or scholarly publishing. It is simply an identifier that can be used for virtually any application.” 
In other words, there’s nothing magical about DOIs when it comes to identifying or locating scholarly works. In the ecosystem of digital identifiers, a DOI is one of many good options. A more nuanced exploration of DOIs and other identifier schemes tells the same story . There is no greater technical benefit to using a DOI vs. some other digital identifier (a persistent uniform resource locator, or PURL, for example). The only real “advantage” of the DOI is that they are increasingly viewed as the only “acceptable” identifier by publishers, and therein lies the problem.
This narrowly held perspective regarding what constitutes an “acceptable” identifier has pervaded so deeply into the minds and habits of publishers that an IR that does not assign DOIs to its datasets may not be deemed an acceptable place for a researcher to deposit data in support of a publication. For example, the data policy at Scientific Data says that, “We are glad to support the use of institutional or project-specific repositories, if they are able to mint DataCite DOIs for hosted data” (emphasis mine). Scientific Data is not alone in this. Earth System Science Data also requires that submitted datasets be deposited in a repository that assigns DOIs .
Why does this matter?
Researchers are now presented with an ever-expanding selection of repositories where they can deposit their data to facilitate sharing. IRs evidence a commitment to persistence and longevity that is lacking in newer infrastructure. The idea that an IR would be deemed an unsuitable archive based solely on the fact that it does not assign a DOI is both absurd and counterproductive. The requirement of a DOI over other repository features (the existence of a preservation policy, for example) serves only to reduce the number of well-supported data preservation options for researchers.
So, what’s the take-away?
Data specialists are working at the forefront of rapid cultural and technological changes in how research is being conducted and shared. The development of useful, broadly applicable best practices for data preservation and sharing relies heavily upon collaboration, with thoughtful contributions from diverse groups working toward a shared goal (Force11 is a terrific example of this). Where standard practices don’t yet exist, researchers, publishers and other stakeholders in the scholarly community are making things up as they go (and very thoughtfully so, but still pretty much winging it). Academic libraries have a lot to offer in this space, and the suitability of using IRs for preserving datasets is an issue that we should not approach with timidity. Reach out and make connections with journals. Question their data policies and offer alternatives. Be bold, #datalibs! Your voice and your involvement are critical.—  http://crosstech.crossref.org/2013/09/dois-unambiguously-and-persistently-identify-published-trustworthy-citable-online-scholarly-literature-right.html
The Tisch Library at Tufts University in Medford/Somerville, Massachusetts is seeking a Librarian for Research Data. Please see the posting for a complete description of the position: http://tufts.taleo.net/careersection/ext/jobdetail.ftl?job=15001602&lang=en.
Contributed by Donna Kafel, Project Coordinator for the New England e-Science Program, Donna.Kafel@umassmed.edu
Andrew Johnson is the Research Data Librarian at the University of Colorado, Boulder, and PI for DataQ, “a collaborative platform for answering research data questions in academic libraries,” that was recently launched in August. DataQ is a unique resource in that it provides a platform where librarians can submit research data management and curation questions which in turn are fielded by the Editorial Team and answered by a DataQ Editor. DataQ is meant to be interactive–community members who have created a DataQ log-in account, are welcome to add to the answers or post comments.
DataQ is funded by an IMLS Sparks! Ignition Grant for Libraries and co-sponsored by University of Colorado Boulder, the Greater Western Library Alliance, and the Great Plains Network.
I spoke with Andrew over the phone recently to learn more about the DataQ project that he and co-PI Megan Bresnahan have led since they were awarded the IMLS Sparks! Ignition grant in November 2014. Much of our discussion revolved around project management aspects of the DataQ grant. Here is an outline of our conversation:
Donna: How did you come up with the idea for DataQ?
Andrew: Megan actually came up with the idea while we were attending RDAP in Baltimore a few years ago. We were trying to think of ways that we could extend local support for librarians engaging in RDM services to the wider community, so she thought that a service like DataQ could be one way to do that. Prior to DataQ, I’d been active in the DataFOUR project (http://imls.gwla.org/), which was sponsored by the GWLA and GPN. The idea for the DataQ grant snowballed from DataFOUR and its aim to provide regional support for developing RDM services. Megan and I applied for the IMLS funding for DataQ with the support of GWLA and GPN, and of course our library administration at UC Boulder. In September we were awarded an IMLS Sparks! one year grant, from Nov. 1, 2014-October 31, 2015, to develop DataQ.
Donna: Can you explain the GWLA and GPN groups? Are they consortia?
Andrew: GWLA is a consortium of research libraries in the central and western United States. GPN was founded by researchers and is a consortium of Midwestern universities focused on cyberinfrastructure initiatives. The two groups collaborate on different projects, and host their annual meetings in conjunction.
Donna: I’m really impressed that in the course of a one year planning grant, you’ve pulled together such a large working group of Editors and launched DataQ –all well within the 12 month timeframe. Can you describe a bit about the project timeline and your working model?
Andrew: Yes, there were a lot of pieces to put together to make DataQ happen. We had a $25,000 budget to work with and a relatively short time to get the project up and running. In the first months we contracted with Drupal developers to create the site. In December we put out a Call for Editors. We were surprised by the overwhelming response to the Call. We had budgeted for eight Editors. It was really hard to limit ourselves to eight when so many highly qualified librarians with experience and expertise in data services responded that they were interested in participating in DataQ. Ultimately we were lucky in that we were able to expand the number of Editors from what we had budgeted as a few Editors received support from their institutions to attend our orientation meeting. We were also very lucky to have a separate group of librarians and other information professionals eager to participate in the project. Many of them accepted our invitation to be virtual project volunteers. They helped the project tremendously. In June, DataQ wasn’t ready for prime time, but we wanted to do a pre-launch of it by putting up a sample web form to collect questions from anonymous users. The pre-launch was a way for us to collect questions and populate DataQ with these initial questions prior to the actual launch. The DataQ volunteers helped us to gather many of these questions as well.
Donna: With the Editors being from all different geographic areas, how did you orient them to the project and develop a system for their workflows?
Andrew: We had an in-person training meeting in June that all the Editors attended that was held alongside the GWLA/GPN meeting. The meeting was very productive with all the Editors fully engaged in discussions as we planned the logistics for developing and implementing DataQ. We were able to develop Editorial workflows, establish a system for communication, brainstorm new ideas that went beyond what Megan and I had initially envisioned, and plan the project in the course of the short time we met.
Donna: What is the internal process that takes place when someone submits a question to the site?
Andrew: We have a listserv that includes all the Editors and the PIs. When a question is submitted, it gets sent to the listserv. Editors can then review the question. Any of them can opt to answer it on a first come first serve basis. The Editor who first responds composes an answer on an internal Google doc. We then have two Editors review the answer. Once an answer is approved, the Editor who authored the response posts it on the site.
Donna: Regarding the users who submit questions, are they anonymous?
Andrew: They can opt to be. We offer three options: users can choose to be anonymous, they can send along their e-mail in case the Editors need to get further information from them to answer their questions (and to let them know when an answer has been posted), or users can opt to sign in to get a DataQ user account. Accounts enable users to post comments on the DataQ site. DataQ is intended to be an interactive site. We hope that users will create user accounts and contribute their ideas and comments.
Donna: Are you seeing trends in the types of questions that users are submitting to DataQ?
Andrew: Yes, we’re seeing quite a few questions related to data citation, data documentation, and data sharing.
Donna: What is the sustainability plan for DataQ?
Andrew: That’s what we’re working on now, planning on how the project will move forward after the funding period. We may be applying for further funding to continue the project.
Donna: Will you be presenting DataQ at any national or regional conferences?
Andrew: We’ve been asked to present a few webinars on it which we’re really glad to do. Also a couple of the Editors will be presenting a poster on it at DLF. We hope to also present it at some other conferences in the coming months.
Donna: DataQ has filled a niche—providing expert answers to librarians’ specific RDM questions. Congratulations to you, Megan, and the entire team in getting DataQ up and running—in an amazingly short time!
Submitted by Donna Kafel, Project Coordinator for the New England e-Science Program, email@example.com
The inaugural New England Research Data Management Roundtable was held last Tuesday, August 18th at the Du Bois Library at the University of Massachusetts Amherst campus. This roundtable is the first in a planned series of roundtable discussions targeted for New England librarians who are engaged in research data management services or who want to learn more about data librarianship. Sponsored by the National Network of Libraries of Medicine, New England Region, the NE RDM Roundtables will provide opportunities for New England librarians to compare notes, ask questions, share lessons learned, explore new working models, acquire fresh ideas for their workplaces and develop new partnerships.
This particular Roundtable event was specifically intended for librarians in the RDM Community of Practice, i.e. librarians who are currently actively engaged in planning and/or delivering RDM services. (Note: future NE Roundtables will also be planned for an RDM Community of Interest). It was also preceded by a tour of the Massachusetts Green High Performance Computing Center in Holyoke, MA. Twenty four librarians from multiple institutions, including four of the five University of Massachusetts campuses, University of Connecticut, Boston University, Boston College, Harvard, MIT, University of New Hampshire, Brandeis, Northeastern, Mt. Holyoke, and Drexel University discussed the topic “Organizational structures for research data management services at our institutions.” Attendees were divided into five tables with four to five other attendees per table. At each table a member of the NE Roundtable planning team served as moderator for the discussion. The program was divided into two 45 minute sessions. During the first session, the discussion topic was structures within the library for delivering RDM services. The second session focused on partners on campus that support RDM services. The discussions revolved around specific questions. Time was given between the two sessions and at the end of the second session for each roundtable to report out.
Feedback on the Roundtable event has been quite positive. Attendees have noted that they like the opportunity to hear what their colleagues are doing and to discuss RDM issues, challenges, strengths, and their libraries’ service models. The New England e-Science Program plans to coordinate future Roundtables three times a year. Topics for these roundtables will be based on attendee recommendations.
The following is a summary of questions and bulleted attendee responses and comments from the Roundtable Discussion tables. For Topic 1 questions 3-6 and Topic 2 questions 1-5, the bulleted responses are grouped by theme.
Topic 1: Library Structures for Delivering RDM Services
1. What is the current structure for data management services at your library? What staff is involved and what are their relationships to each other and the work?
- Library director appointed a non-librarian project manager to be DM liaison between Office of Research and the Library. The library is not providing DM services but is incorporating “digital measures”—digitizing faculty CVs for all time (historical)
- Small undergraduate science library just starting out in RDM, no organizational structure yet. Science librarian has been assigned the DM role recently and is learning. He gave presentation about RDM with 2 other librarians to faculty.
- Engineering and Data Services librarian started a year ago. He oversees all aspects of data services in library. Other librarians are involved –science librarian as liaison to science faculty and NE e-Science program, metadata librarian for help with metadata and ontologies, and Systems dept for software support (such as DMP Tool).
- An eScience team made up of three librarians from the Science Library led by one of these three.
- Working group made up of librarians from different disciplines and systems librarian. Most librarians involved are science/engineering, and IT is involved.
- Large research university library has had a DM task force for “way too long.” This includes librarian representatives from special collections, science library, social sciences, library systems, scholarly communications that is coordinated by the director of the science library. Having a DH librarian on team has helped the team not to focus exclusively on STEM fields.
- Private academic health sciences library has DM working group with reps from the library/archives/research labs, postdocs, IT. The group meets 6x/year. The working group would like to hire a data expert to focus on archiving large longitudinal study
- Has Library Data Services Advisory Group, which started 1 ½ years ago. The group is made up of scholarly communications librarian, IR librarian, Associate Library Director, two outreach librarians, Head of Office of Sponsored Research, and representative from Research Computing.
2. How did this service begin and how has it changed over time?
- Service did not begin at small public university until data services librarian started. It has changed with tweaks to the library’s Data Workshop series for faculty, PhD students and some staff, has RDM libguide based on NSF requirements, slightly customized DMP Tool
- Started in 2012 with E-Science Institute, an RDM services working group began last year (includes ~ 12 people—IR librarian, desktop services, 3 dept liaisons (science, soc science, and gov docs), and academic technology, analysis expert
- Started somewhat informally several years ago by three science & engineering librarians who co-created an RDM libguide. Always been more of a collegial staff than a hierarchical one. Some team members are specifically part of the Data and Specialized Services Dept.
- Got started by teaching workshops a while ago, were more successful with grad students than faculty.
- Started since the ARL “Future of Science Librarianship” conference, the library formed a team of subject specialists, scholarly comm. librarian sits in.
3. What strengths does your library have related to data management and how did you fill them?
- Has a dedicated RDM fellow
- Has a dedicated RDM librarian
- University is small enough so that small library team can manage all requests, enough background among library staff to serve most of population. Research population tilts more toward the natural/physical sciences so there are fewer disciplines to keep abreast of.
- Library has expertise in metadata services, building collections, describing information, enabling access. Staff is dedicated to helping faculty/students/staff. The library has a vision to create RDM jobs among the library staff.
- Some capacity for more in-depth consultations
- Focused team approach, specialized knowledge plus shared responsibility
- Library invested in infrastructure to support researchers—e.g. repository, research computing
- Has a Data & Specialized Services department
- Lots of varied expertise in large research university
- Strong IR
- A merged department with IT is very useful as IT people have good ideas about implementing DMPs
- New library administrator has strong RDM background and is committed to growing library RDM services
Perception of Library
- Library has established a good reputation through IR
- Library has existing working relationships with campus constituencies
- Good working relationship with Office of Research
- Developed short “quick bites” RDM introductions instead of long workshops
- Broader committee brings in stakeholders across campus (IT, sponsored programs)
- Getting the word out to the community, having services that resonate with users, built relationships with researchers
- New library administrator has strong RDM background and is committed to growing library RDM services
4. What weaknesses does your library have related to data management and how do you address them?
- Not sure if there are needed RDM services that the library is not aware of
- Struggling with a campus wide lack of cohesive outlook on RDM that makes for confusion
- Defining data management—it means different things to different people
- Haven’t been able to get researchers and students to enroll in library’s RDM courses
- No courage to stop doing what we’ve been doing for 20 yrs—e.g. reference shifts, low level
- Not clear how to avail expertise from the librarians who are outside of the data services team
- Short staffing limits what library can do
- Many liaisons are more focused on collections, don’t see relevance of RDM services or are fearful of change
- No central focused person to head library’s RDM team
- Members of RDM working group can’t dedicate time to work with liaisons
- Lack of RDM policies (common among many institutions)
- Lack an institutional repository or a holding center for data in progress
- Lack of funding
- Trying to initiate new library services as a lower level staff person—need support of library administrators and their involvement in securing campus buy-in
- Difficult to bring researchers together on a Balkanized campus
- Isolation from researcher community that library serves
5. What are your main program elements for data management services in the library and how do you conduct them?
RDM Working Groups (see descriptions of working groups in question 1)
- Developed LibGuide (noted by multiple individuals)
- Data Management Workshop series-an overview of RDM theory and applications—hour long sessions that are held 1-2x/week; LibGuide, DPM tool, Consultation Services by appt.
- DMP Consultations
- RDM team is made up of several librarians who consult on DMPs
- Library offers consultations, workshops, conducts training during Responsible Conduct of Research sessions
- Archiving older data sets, got a CLIR grant for collection of data, archiving a large longitudinal study of child health and clinical data, trying to hire a data person to focus on this.
- Work with Office of Sponsored Research to find out new grants and reach out to PIs
6. What would you like to be doing (as a library and as an individual) related to data management that you are not doing now?
- Have a seat at the table—a place in the formal campus structure where decisions on infrastructure and services are made
- Collaborate with Digital Humanities
- Have contact with research team throughout grant and project cycle
- Create a data IR (one library noted goal to use Dataverse for its data IR)
- Create an infrastructure similar to Purdue where library is the portal and telling the story and IT provides the infrastructure and the Library works closely with Office of Research on compliance
- Track where data is going
- Would like to get more liaison librarians involved with data management
- Have a dedicated librarian who is a focal point for RDM
- Have RDM training incorporated into 1st year grad student requirements
- Conduct RDM training in conjunction with Responsible Conduct of Research training
- Get an RDM course into the curriculum
Topic 2: Collaborations on Campus
1. Who are your current partners on campus?
Uncertainty and problems
- In early stages, trying to learn as much as possible
- Not sure where this is going
- We’re trying to figure out what to do next
- Very do-it-yourself and there are pockets everywhere
- How do services connect when there is no commitment to collaborate?
- Until there is a policy behind it, they will not fund/go further
- Recognize a need for campus-wide “thing” but getting it moving – what do researchers what?
- Always library initiated
- When personnel change, connections change; developing relationships takes time
- Keep liaisons in the loop when working with faculty
- Sometimes faculty don’t come to the library or know of services, many are doing it on their own
- We’re making headway
- With IT and sponsored research, it can be one-sided and difficult at times
- Some campus admins are on board, some are not
Planning and ideas
- Library is the one thinking about this, talking with potential partners
- Services being offered: consultations, education, websites
- Library has back-channel communication with IT staff
- Repository available for some but not all institutions
- Going to create a team
- Putting together meeting of stakeholders on campus
- Partnerships are in their infancy. We want to reach the full community
- Survey on data needs
- We are doing the DMP Tool
- Considering a campus data summit
- Connections from open access policy are useful for data management policy/ practice discussion
- Ongoing discussion about campus infrastructure
- Finding ways to get to the faculty
- Relationships can lead to partnerships
- Partnerships stem out of just talking to other people
- Helps to have culture of open doors, availability to at least discuss
- Academic computing relationship is informal to semi-formal
- Referrals from Office of Research, this is a collaboration “waiting to happen”
- Research computing
- Office of research site–links to library data services
- Policies for data ownership and management
- Working on DMP Tool single sign-on
- co-host meetings for faculty
- host ELN jointly
- workshop participation, such as on data security & active storage topics
- Post-Doc office
- Office of sponsored programs
- info for libguides
- funding policies
- info about grants currently funded on campus
- access to dmps already written
- instruction for DMP Tool
- Labs & offices
- small scale instruction
- data to ingest into repository
- workshop on how to write proposal, including DMP training
- training in specific areas
- Grad student office
- instruction and orientation, for example on cloud storage
- eScience institute
- building training modules
- Scholarly communications office
- open access
- public access policies
- Office of general council
2. On what programs do you collaborate with campus partners?
- Customizing the DMP Tool
- DMP consultations
- Co-presentations with sponsored programs
- Co-presentations with scholarly publications
- Outreach – visit seminars and institutes
- Tech fair – library repository
- Three services: webpage, consultations, education.
- Three data services: consultations for DMP’s or general data management topics; education and training; and data archiving either in data repositories or in our institutional data repository
- Have a data management libguide
- Data Services webpage
- Webpage for services – spells out what we mean by RDM; points people to the different contacts on campus for data lifecycles; everything in one place
3. Who would you like to collaborate with?
- Building collaborations with faculty
- Faculty are interested in library supporting them and being involved; some elements are there
- We’d love to hear more from OSP; there are often time constraints
- Sponsored projects workshops
Office of Research
- “Research day” – compliance
- Workshops / outreach
- DM workshop series
- Copyright classes to graduate students
- Data management for active research, for example ELN
- It would be great to have a university level strategy
- Get a partnership with preservation
- A data board that could help with developing services
- Webpage to point people to certain areas
- Stakeholders – would like broader outreach, a unified group across campus
- New faculty institute – IRB, funding/grants/DMP’s
4. What are the roadblocks?
Perception of Library
- Being seen as being credible and useful. Libraries are seen as having a certain skill set. Need to have conversations and advocates in higher places – i.e. Provost. There’s a disconnect sometimes in terms of what people know of services in library.
- Perception of library as rare books room
- People do their own thing and don’t depend on library
- Research data policy/ lack thereof
- No policies & policies that do exist people do not know about
- Pass a policy but can it be implemented, is it realistic in what the library/institution can do?
- No buy-in due to “high up” (policy driver)
- IT layoffs
- Turf wars, territoriality (we can do this ourselves)
- Staffing stability
- People saying something will happen by a date and it not
- Turnover of staff / loss of staff positions can put a hold on things
- Personalities can be a problem
- Campus IT can be hard to communicate with
- Limited capacity for new services
- Lack of consistency of funding
- Lack of structure/organization
- Other departments have other agendas, similar issues but different priorities. Timing can be an issue to work with people on different schedules
- Getting PI’s on board, they all do things differently
- PI’s may train lab really well in DMP, others do not
- Different needs for different researchers
- Decentralized means different parts don’t always communicate
- What does language mean? Ex archiving, DM services. Have to define terms and how you are using them, controlled vocabulary
5. What support is needed from the library or the institution?
- High level support / promotion
- Infrastructure – e.g. ELN, repository
- People network
- What are other stakeholder desires & interests? Know enough about campus to make solutions
- Institutional view of issues
- Quality metadata requirements need repository librarian –
- Need more support for archiving & storage
- Problem is librarians want to take on tasks / have to take on tasks but cannot give something up.
- Communicating with peers – learning what else is happening
- What about Social Sciences & Humanities?
- Library management has been helpful
- Professional development from the library
- Help from the library to make connections
6. What external support is needed?
- Professional development to broaden knowledge (like this!)
- Listserv of this group
- STS listserv is helpful but don’t brand as discipline specific
- How to host NE region listserv?
- Sharing experiences & training with other librarians
- Short videos on technical subjects, like bit rot, preservation of videos, subject repository vs. local storage
Submitted by guest contributor Amanda Rust, Digital Humanities Librarian, Assistant Director, Digital Scholarship Group, Northeastern University Libraries, firstname.lastname@example.org
About six months ago I began a new position as a Digital Humanities Librarian, and I am now lucky enough to work with humanities data from the nitty-gritty (helping researchers contact publishers to acquire historical newspaper data) to the broadly conceptual (how is historical cultural data made?)
So in composing this short post, I thought I’d start with some recent big-picture discussions, and then apply some of those concepts to cultural data, which is often library-produced data.
Lisa Gitelman’s edited volume Raw Data is an Oxymoron and Johanna Drucker’s reformulation of data and capta (first in this 2011 Digital Humanities Quarterly article) are two excellent places to start, and well-known in the digital humanities field. To give a likely overly broad summary: these works suggest that the very meaning of “data” has changed over time, and even what we think of as the most natural, obvious, “given” data is designed in some way. The experimenter chose to observe it, created instruments encoding choices on how to measure it, perhaps disregarded outliers, imposed categorization and storage once it was captured, and so on.
Not to say that other disciplines have never considered these ideas! On the Humanist email list – one of the oldest online spaces for digital humanities work – there was recently a thread where long-time moderator Willard McCarthy prompted discussion of a resonant quote from Barry Lopez’ 1986 Arctic Dreams. In that book Lopez, a field biologist, discusses his work in the Arctic and “wonders” at the process of naming, the process of reducing what takes place “out there” to patterns that are statistically important, concluding that for the species under study: “No matter how long you watch, you will not see all it can do.”
So how do these theoretical considerations come into play when working with digital humanities projects? We are always confronting what’s been left out of the data. Researchers may start with open access data because it’s there, not because it’s the most relevant, immediately prompting us to notice that some core historical collections are only available via subscription. Why were some resources scanned and made open access, and others not? The vagaries of grant funding? The gaps between wealthy institutions that can afford to scan their collections, and those that cannot? The pressure on institutions to see special collections as a revenue stream?
Beyond the question of open or paid access, researchers are now asking detailed questions on libraries’ selection processes behind both preservation and digitization. Is the data representative of what was culturally significant in the past? What the library later determined to be significant? Who defines “significant”? Or was the original selection based on what was in good condition, or with clear copyright, or had multiple copies, or lacked multiple copies, or had a thematic focus that was easily grant fund-able? Libraries are often the producers of humanities data – or, capta — so it is both thrilling and frightening when digital humanities scholars ask these uncomfortable questions.
A request from Myrna Morales, Data Curation Graduate Student at University of Illinois at Urbana-Champaign, email@example.com
What: Request for identification of a data set
Why: Offer of assistance with data set by a data curation student
Course: Foundations in Data Curation
When: September-December 2015
We work with a Data Curation Specialization certification program team at the University of Illinois Urbana Champaign Graduate School of Library and Information Science (UIUC, GSLIS). Taught since 2007 as part of the MSLIS program, this one-semester Foundations of Data Curation (DC) course integrates as much exposure to data issues and direct experience with data as possible.
In recent semesters we have found that hands-on experience with real data sets noticeably improves student class engagement and understanding. Students are able to work effectively upgrading, ingesting, and/or rescuing a dataset. For instance, students improve their skills by enriching documentation, structuring for ingestion, and reformatting to accessible formats.
Students select a dataset at the start of the course and continue working on it in phases: 1) investigating & selecting a dataset; 2) developing a data management plan for improving the dataset; and 3) implementing the plan given available time and resources. Each dataset has an associated contact but communication with the dataset contact is restricted until the student has demonstrated to the instructors that they have mastered an understanding of the data and related available resources including papers or reports in the peer reviewed literature.
There is an expectation that if a student substantially improves the metadata documentation or the state of the data that the repository would consider using the results of their work. For instance, the National Snow and Ice Data Center and the National Space Science Data Center have publishing datasets worked on by students, datasets that would not otherwise be publically available. In addition to contributing to data availability, this approach represents an opportunity for a) students to provide a pointer to an example of data curation work on their vitae and b) repositories to enhance visibility of some data as well as to highlight their contributions to education and training of a much needed workforce in data curation.
If you have data that require attention and are interested in having a data curation student work with your data sets as a class project, please contact us. We would need to know the name of the data set, the type of data, a summary of what work you feel is needed, the name and contact information for a point of contact for the student, as well as a pointer to the data or a mechanism to access it. The first day of class is August 26, so we would need this information by the beginning of that week.
Ruth Duerr, firstname.lastname@example.org, Ronin Institute for Independent Scholarship and Adjunct Professor Graduate School of Library and Information Science, UIUC
Myrna Morales, email@example.com, UIUC Graduate Student in Data Curation
By Jen Ferguson, Co-Chair of the e-Science Portal Editorial Board. Jen can be contacted at firstname.lastname@example.org
After many rounds of user feedback, testing, and revision, we are very pleased to unveil the revamped e-Science portal today. In addition to the aesthetic redesign, here are a few of the more significant changes we’ve made to the portal based on your comments and suggestions:
- Added a new ‘getting started with e-Science’ quick guide
- Moved the events calendar to the front page
- Added the Twitter feed to the front page of the portal
- Reorganized the content headings significantly. Data Management, in particular, received a major overhaul – it now includes separate sections on research data lifecycles, data management planning, data curation, reasons to cite data, etc.
- Links to data tools have been posted directly to relevant pages such as the data curation page
- Tidied up site navigation in general, and pared down the size of the footer
- Clarified the relationship between the eScience portal and its partner projects
Our editors have been hard at work too – they’ve weeded older content in favor of focusing on smaller selections of newer material. The portal depends heavily upon our crack team of editors, and we’re happy to let them shine a bit more in the revamped portal. You told us you wanted to know more about the people behind the pages, and we heard you! The bios and smiling faces of our editors are now featured more prominently, and we’ve also made it easier for you to contact them directly. Please indulge me in a quick Academy Awards-style shoutout list to recognize those without whom this launch would not have been possible. Thanks to:
- My partner in editorial board co-chair crime, Katie – for knowing what works, what doesn’t, and not being afraid to call ‘em like you see ‘em.
- Our editors Amanda, Andrew, Daina, Jake, Julie, Margaret, and Stacy – for your fresh ideas, your patience as we messed about with your content areas, and your willingness to jump into your content headfirst.
- Usability consultant Bethany – for adding your voice and guidance to our revamp efforts.
- Portal staffers Bob, Donna, and Elaine – for keeping things on track and the project moving forward. We put poor Bob through his paces with this launch! Luckily he’s still talking to us.
Last but not least – thank you, readers, for lending us your time, expertise, and energies with everything from card sort exercises to a couple of rounds of beta testing. By my estimation around 50 of you participated in this revamp in some way. We couldn’t have done it without you! The portal is not done – is a website ever ‘done’? – but we’ve reached a point where we feel ready to release. What do you think of it so far? We welcome your feedback – comment on this post, Tweet to @NERescience, or shoot one of the editors an email.
Submitted by Jake Carlson, Research Data Services Manager, University of Michigan, email@example.com
I recently came across three opinion pieces that got me thinking on the current state of data librarians.
The first one, “Stacking the Deck” by Professor Michael Stephens, was published in the Library Journal. He describes “the full stack employee,” as first articulated by tech writer Chris Messina, and then re-imagines this description into the library workplace. A full stack employee is someone who is always on, deeply invested and goes the extra mile. They continually seek out new ways of producing and innovating through the application of technology and best practices. They are deeply connected to their peers through social media and share what they are doing, not to purposefully make a name for themselves, but to give back and add value to their communities. It’s not that they know everything; it’s that they are driven to discover possibilities and to bring people together.
Sounds like a model employee type that every library would want to hire, right? But what about the librarian him or herself? Are they “full stacked” because they want to be, or because they feel they have to be just to do their job?
Which brings me to the second piece, “Hiring Data Librarians” written by Alexis Johnson and published on Scribd. Alex is a self-described new data librarian and writes on adjusting to the position. Data librarians are often asked to perform a great many tasks and to possess or acquire a great many skills to perform their functions. Alex’s experience was coming in to the position with one set of expectations and then having more and more responsibilities piled on because “you’re a bright young fellow.” These creeping additions that Alex describes led to feelings of inadequacy for never being able to do enough as a data librarian and an anxiety that comes with feeling that you have to devote nights and weekends to learning and skill development. Alex closes the piece describing an actual job ad for a data librarian that includes 5 areas of responsibility, each of which could be considered a full time job in and of itself.
Finally, Rick Anderson writes of a “Quiet culture war in research libraries, and what it means for librarians, researchers and publishers” in UKSG Insights earlier this month. He is not writing on data librarianship directly but instead describes two competing conceptions of the role of the modern research library. On one side there are those who believe that the mission of the research library is to support the needs of its host institution. On the other side are those who would argue that libraries ought to focus on addressing larger issues of scholarly communication irrespective of institution. It is a lengthy piece and I cannot do it justice by trying to summarize it here. What drew my attention were his observations that disagreements in conceptions of the fundamental mission and how these disagreements play out in the operating culture of the library create tensions in the library’s allocation of scarce resources to its programs and projects.
Taken together these pieces present a potential problem for data librarians. I find Professor Stephen’s articulation of a full stack librarian interesting (though I do find the implicit equating of high performance in librarians to tech savviness and youth rather troubling). However, I am concerned that libraries as organizations will come to expect or demand such a complete commitment from hired data librarians without recognizing or providing the level of support needed for him or her to be successful. There are many, many ways that librarians could incorporate working with research data into their positions, but all too often I see job ads like the one described by Alex that over reach and ask for more than one person could possibly accomplish. What this type of job ad implies is that the hiring institution does not know what it wants to do in providing research data support, and in all likelihood will expect the person hired to figure it out for them. In this scenario, the hired librarian may not receive the resources or support needed to be successful. As Dorothea Salo has noted, the practice of hiring smart and talented librarians into ill-defined positions without providing them a solid base of support runs a high risk of burning out and driving away the very people libraries want to attract.
Developing data services is more than just hiring a librarian. It needs to be about the library as an organization making a commitment and investment of time, money and other resources to understand the needs of the communities (within or outside of the institution) and then to respond in ways that add value. This is not to say that libraries must have everything worked out beforehand, rather it is to recognize that getting into data will affect library organization and culture, and that a willingness to consider and openly support change will be needed to succeed. In other words, to support the full stacked librarian, we ought to consider how to build a full stacked library.
If you didn’t get to attend the 2015 New England Science Boot Camp that was held June 17-19th at Bowdoin College, no worries. And if you did attend boot camp, but would like the opportunity to review the interesting presentations, you can do that too!
All the presentations from the NE SBC 2015 from the Science Sessions, special Wednesday evening presentation, and the Capstone are now available on the Science Boot Camp for Librarians YouTube playlist at https://www.youtube.com/playlist?list=PLNtON4mU3aIdSsDOcOSGYcHjtlPJRLgDF
Check them out! And if you’d like to view what this year’s SBC topics were, check out the 2015 NE Science Boot Camp LibGuide.
The following job opportunities may be of interest to the e-Science Community:
George Washington University, Washington, DC: Research Services Coordinators (3 positions): https://www.gwu.jobs/postings/27542
Northeastern University, Boston, MA: Data Analytics/Visualization Specialist https://neu.peopleadmin.com/postings/35539
Reed College, Portland, OR: Data Services Librarian http://library.reed.edu/about/data-services-librarian
San Jose State University, Moss Landing Marine Laboratories : Senior Assistant Librarian, Tenure Track https://www.mlml.calstate.edu/sites/default/files/Tenure-track%20Assistant%20Librarian.pdf
University of California, Los Angeles: Scholarly Communications Librarian http://joblist.ala.org/modules/jobseeker/Scholarly-Communication-Librarian/30343.cfm
University of Missouri, Kansas City: Dental Scholarly Communications Relations and Outreach Librarian: https://myhr.umsystem.edu/psp/tamext/KCITY/HRMS/c/HRS_HRAM.HRS_CE.GBL?SiteId=8
Check them out! The following two articles have just been posted in the Journal of eScience Librarianship:
Assessment of Data Management Services at New England Region Resource Libraries
Julie Goldman, Donna Kafel, and Elaine R. Martin
Submitted by guest contributor, Katie Houk, Health & Life Sciences Librarian at San Diego State University. Katie’s e-mail address is firstname.lastname@example.org
I made the move from a smaller, private university setting to a large, public teaching university across the country eight months ago. One of my priorities was to bring data management education and awareness to the campus. I’ve been fortunate enough to work with our graduate and research affairs office to send out a environmental scan and to get approximately 120 responses. Compared to the number of faculty on campus it’s a rather limited response, but if you’ve ever tried to survey faculty you know how excited my team was to get over 100 responses to something sent out at the end of the spring semester.
It didn’t come as a surprise that when asked what they needed help with most, faculty thought of the most pressing and immediate need – writing Data Management Plans. It also wasn’t too surprising that the next issue on the list was data storage and backup, followed by sharing data, and lastly, preservation issues. What is disappointing to myself – and probably many of you – is the lack of infrastructure and campus centrality needed to deal with these last three issues.
Almost as soon as I arrived on campus I was asked to put together a proposal for an institutional repository solution. Our white paper was thorough and we asked for a robust solution as well as the minimal faculty and staff power it would take to run it. Sadly, the library faces an uphill battle with legitimizing our place on campus and finds it hard to get funding for large projects that require more manpower. I have since learned, however, that there is a group on campus looking into a “data storage solution” but it has no librarian involvement, and possibly no faculty researcher involvement, either.
The disconnect that happens between administration and faculty, and even faculty & administration and the library is a major impediment to creating the infrastructure required to help manage electronic data. If librarians and research faculty are not on the group looking at a campus-wide solution, will the implementation of such a thing actually provide anything actually usable? Not likely, as we consistently request a more robust IR platform than what is provided to our university system through the Chancellor’s office.
My current thoughts on the situation here area as follows:
- How does the library gain legitimacy as a unit to speak to when designing cross-campus solutions for electronic data storage and backup, etc?
- Does a university already strapped for funding want to enter the territory of trying to provide storage for research data when they don’t know or understand the amount or type of data being produced?
- Who will be in charge of this centralized solution and how will it be promoted and taught to a campus that is known for being very decentralized?
- How does the library not over-involve ourselves – since we are under-staffed as it is – in a situation where we are leading the charge and bringing to light these issues?
Have you struggled with these issues at your institution? How have you approached solving them (or have you ignored them out of necessity)? Do libraries need to have a collective plan or toolkit for helping solve these issues?
Contributed by Donna Kafel, Coordinator for the New England e-Science Program and Member of the NE Science Boot Camp Planning Group.
Along with my fellow New England Science Boot Campers, I headed Downeast this year for the seventh annual New England Science Boot Camp (SBC) at Bowdoin College in Brunswick, Maine, June 17-19th. Having heard many rave reviews about both Bowdoin and Brunswick, I was excited to have the opportunity to savor campus life there for a few short boot camp days. I instantly loved the Bowdoin campus and the town of Brunswick and soon found myself longing to be a Bowdoin student!
The science session topics for this year’s SBC were Cognitive Neuroscience, Marine Science, and Ornithology. Each of the boot camp science sessions feature two faculty from selected New England colleges and universities. Generally the sessions are structured so that for the first part one faculty member provides an overview of the science, followed by a second faculty member discussing the research he/she is conducting in the field. This year’s first science session was Cognitive Neuroscience. Dr. Erika Nyhus of Bowdoin College discussed key concepts and the types of classic experiments (remember Pavlov’s dogs and BF Skinner?) that laid the foundation for the field. Dr. Ann Maloney of UMass Medical School presented her research on altering neurocognition through videogames, specifically with children and teens with bipolar depression. Unfortunately many young people with bipolar depression require heavy doses of multiple medications to treat depression, and these medications have many undesireable side effects, such as rapid weight gain. The focus of Dr. Maloney’s research is studying the effects of video gaming on weight gain, and mood. Some preliminary findings from her research are that a significant number of her research participants with bipolar depression were able to stabilize their weight and required lower dosages of their psychiatric meds when they regularly engaged in active video games.
Wednesday evening featured a Literature and the History of Medicine themed talk by Dr. Ann Kibbie, of Bowdoin, “For the Blood is the Life: Dracula and the Early History of Blood Transfusion.” Dr. Kibbie discussed the perception of blood over the years in early medicine, the theories behind bloodletting as a way to restore wellness, and the early practice of blood transfusions—some of which were humans receiving blood from animals. I found myself astounded at how anyone could survive these early transfusions, from animals and other humans—without today’s technology of typing and crossing blood to ensure blood recipients are transfused with compatible blood.
The Marine Science session featured Dr. Barry Costa-Pierce of the University of New England discussing aquatic fisheries and the dire need to develop aquaculture in an environmentally sound way in order to feed the planet. Dr. Costa-Pierce noted that aquaculture and marine fisheries are often perceived negatively, as the popular press has done extensive coverage on antibiotic-laden fish farms, and recommended that consumers find out where their fish is from, as the types of fisheries vary dramatically from one country to another. Dr. Whitney King, of Colby College, presented his research on Maine lakes, the impact phosphorus pollutants have had on increased growth of algae and decreased oxygen in the lakes and approaches to alleviating the destruction of Maine lakes.
Ornithology was the last of the SBC science sessions. Dr. Michael Reed, of Tufts University, was the overview speaker. While he did discuss bird basics, and the interdisciplinary nature of ornithology research, what was striking about Dr. Reed’s talk is that he really covered his use of library resources—a topic that for years, our NE Science Boot Campers have wanted faculty speakers to address! Ornithology is a field in which print and digital resources are used extensively. Dr. Reed discussed popular ornithology journals, the relation of scholarly societies and journal publishing, increased availability of open source materials, his consults with Tufts Tisch Library staff searching for obscure documents, his frequent use of interlibrary loan, students’ database search practices and heavily-used ornithology data repositories such as ORNIS, NA Breeding Bird Survey, Xeno-canto (a bird song sharing repository), and Global Population Dynamics Database. These databases are heavily used and invaluable resources for ornithologists around the world. Dr. Nat Wheelwright of Bowdoin followed Dr. Reed’s presentation, starting off with a recording of a male Savannah sparrow. Dr. Wheelwright studies Savannah sparrows on the very remote Kent Island, where Bowdoin has a multidisciplinary field research station. In his presentation Incest avoidance in an island bird population, Dr. Wheelwright discussed the extraordinarily rare instances of accidental incest in the diminishing Savannah sparrow population on Kent Island. It was interesting to hear his data management practices. He collects data in the field on “Write in the Rain pads.” These are used frequently in field studies because they are water repellent. Every night he and other members of his team enter the data from the field studies into a database that has numerical limits enabling auto-correct for specific metrics.
The SBC Capstone session featured a presentation by Thea Atwood, Engineering Librarian at UMass Amherst, and Cara Martin-Tetreault, Director of Sponsored Research at Bowdoin, on the OSTP directive for enabling public access to federally funded research output. Thea discussed the policy and federal agencies’ responses regarding data management plan requirements, and the OSTP’s impact on library data services. Cara discussed funders’ requirements for data management plans in grant proposals from an institutional perspective. In her discussion, Cara noted that if Bowdoin Science Librarian Sue O’Dell hadn’t initiated a discussion with her about library interest in research data management, she would never have thought of the library as a partner in supporting research data management at Bowdoin. Data management plans are one component among many other grant proposal requirements that sponsored research offices have to address, and Cara welcomed this working partnership with the library in supporting researchers’ data management plans.
The second half of the session was a breakout activity. The week before SBC every Capstone attendee had been sent one research case from the New England Collaborative Data Management Curriculum (NECDMC), to read ahead of the Capstone, to prepare for the activity: writing and reviewing a data management plan. The Capstone attendees were divided into assigned groups of four or five and all group members had been assigned the same research case. Five cases from NECDMC were featured in the Capstone, and every two groups had the same case. The groups were tasked with writing a data management plan based on the case. After 40 minutes, each group swapped their data management plan with the other group who had been assigned the same case, and the groups reviewed each others’ data management plans. After the data management plans were all reviewed, scribes for each group gave their data management plans and reviews to one of the Capstone organizers, and returned to the auditorium for a whole group discussion. The group was asked several questions about their experience writing and reviewing data management plans. When asked what worked well in writing the data management plans, attendees noted having someone in their group with subject expertise, breaking down and mapping the data components of the case, and labeling the data as qualitative or quantitative. Challenges in writing the data management plans were not knowing the requirements of specific funding agencies, not having an institutional policy, being unfamiliar with terminologies or instrumentation. For some cases Capstone attendees noted that it would have been helpful to have disciplinary knowledge. Attendees noted that they liked being able to review another group’s data management plan on the same case, as it gave them an opportunity to see the data management components of the research case from different eyes.
This writing and reviewing data management plan activity was designed to give attendees the experience of reviewing a research project in a discipline that they may be unfamiliar with, and identifying the key data management components that would need to be addressed in a data management plan—a process that a librarian consulting on a data management plan would do in actual practice.
The cases, the group’s data management plans, and the reviews can be viewed on the the Capstone page of the 2015 Science Boot Camp LibGuide under the “Capstoners Data Management Plan and Critiques” section.
So that’s a not so brief recap of this year’s New England Science Boot Camp. Many thanks to this year’s gracious Science Boot Camp host, Sue O’Dell and the members of the New England Science Boot Camp Planning Committee for all their hard work over the year putting together this rich and unique learning and networking event. I’ll be announcing when the boot camp videos are available for viewing–stay tuned!
In the MORNING, tour a world-class computational center in Holyoke, MA. The Massachusetts Green High Performance Computing Center (MGHPCC) serves the growing research computing needs of five of the most research-intensive universities in Massachusetts: BU, Harvard, MIT, Northeaster, and the University of Massachusetts. The computers in the MGHPCC run millions of virtual experiments per month, supporting thousands of researchers in Massachusetts and around the world.
In the AFTERNOON, attend the first Research Data Management Roundtable discussion at the University of Massachusetts, Amherst. This roundtable is an informal gathering of librarians actively engaging in data services (e.g. planning data services, serving on library data service advisory group, consulting on dmps, data curation, teaching rdm, data curation.) and the first in a series of discussions focusing on practical details and learning from our colleagues about research data management. The discussion topics for this session are our organizational structures, both within the library and across campus.
Later Roundtable events will reach out to librarians just beginning to work in research data management.
Further information on these events is available at the libguide August 2015 eScience Events in MA – Tour & Roundtable, http://classguides.lib.uconn.edu/nerdmtable
Registration for both events is here: http://goo.gl/forms/3a1DJKocFF
Registration opens on July 1 and closes on August 7. Space is limited to 25 participants. The Roundtable event is sponsored by the New England Regional Medical Library’s eScience Advisory Board. For details, contact Donna Kafel at Donna.Kafel@umassmed.edu.
Posted on behalf of Kristen Burgess, Assistant Director for Research and Informatics, Donald C. Harrison Health Sciences Library, University of Cincinnati Libraries
The University of Cincinnati (UC) Libraries seek an Assistant Director for Health Sciences Library (HSL) and Henry R. Winkler Center for the History of the Health Professions (Winkler Center) Operations.
The Assistant Director for HSL and Winkler Center Operations provides leadership and coordination for the daily operations of the Health Sciences Library and Winkler Center. In collaboration with other members of the HSL leadership team, the Assistant Director assists with development of policies and procedures, implementation of the UC Libraries strategic plan, facilities management and scheduling, and financial and human resources allocation. The Assistant Director plays a central role in developing new programs and coordinating collection development and management.
For the full position description and information about how to apply, see http://bit.ly/1IxeKfp . UC is an EE/AA employer.