Feed aggregator

The Diversity of Data Management : Practical Approaches for Health Sciences Librarianship Webcast

e-Science Portal Blog - Thu, 03/19/2015 - 15:04

The Lamar Soutter Library at the University of Massachusetts Medical School in Worcester, MA is hosting a viewing of the MLA webcast, The Diversity of Data Management:  Practical Approaches for Health Sciences Librarianship, on Wednesday, April 22 from 2-3:30 pm.

As noted by the Medical Library Associaion, this webcast is designed to provide health sciences librarians with an introduction to data management, including how data are used within the research landscape, and the current climate around data management in biomedical research. Three librarians working with data management at their institutions will present case studies and examples of products and services they have implemented, and provide strategies for and success stories about what has worked to get data management services up and running at their libraries.

Attending the webcast is free of charge, but space is limited so advance registration is required.  If you would like to register to attend the webcast in Worcester, click here.

HHS responds to the 2013 OSTP Memo: NIH and Data Management Plans

e-Science Portal Blog - Tue, 03/10/2015 - 17:32

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

In response to the Office of Science and Technology Policy (OSTP) 2013 memo regarding public access to federally funded research, five Health and Human Services agencies released their long awaited implementation plans at the end of February 2015. The OSTP memo, released two years ago in February 2013, instructed federal agencies with research and development budgets of (or exceeding) $100 million to develop strategies to make the results of federally funded research freely available to the public within a year of publication; this directive includes research data as a research result to be shared with the public. The recent updates came from the HHS’s National Institutes of Health (NIH), Centers for Disease Control and Prevention (CDC), Food and Drug Administration (FDA), Agency for Healthcare Research and Quality (AHRQ), and Office of the Assistant Secretary for Preparedness and Response (ASPR), whose plans all address scientific publications and research data with corresponding discovery and access points in PubMed Central and eventually healthdata.gov. However, for this post I will primarily focus in on NIH’s “evolving” data policies and point out the RFI that librarians can contribute to to help shape that evolving process.

For the last few years, I have regularly heard statements from librarians and others that seem to equate the NSF General Data Management Plan Policy with the NIH Data Sharing Policy and Public Access Policy. However, these funder requirements are drastically different both in implementation and result, and the above-described announcements make this all the more clear. Namely, the NSF has robust requirements for all researchers to submit Data Management Plans as part of their grant applications, where the NIH does not. Rather, the NIH requires public access to funded manuscripts, as well as a statement addressing how/if data will or will not be shared in a section of the agency’s grant applications—this second requirement only applies to researchers requesting $500,000 or more in direct costs in funding from NIH for research for any one year. The NIH does not require a formal DMP though, nor is there any process in place by which the NIH ensures that data is actually being shared by the researchers that they fund, though data sharing is actively encouraged. I have personally found that this situation has made it a challenge to illustrate to NIH funded researchers the importance of writing a DMP—when the funder is not asking for more robust planning, it can be difficult (though not impossible) to convince researchers to put in the necessary effort to thoroughly plan.

The NIH’s recently released announcements responding to the OSTP Memo, make very few updates in regard to Data Management Plans, as the HSS agencies see data policies as “evolving” and recognize that much of the agencies’ funded research data resides externally to the agencies themselves. As of right now, HHS has no shared repository for deposit of HHS agencies’ research data or catalog of associated metadata. The plan presented notes that an internal HHS Enterprise Data Inventory will serve as the catalog for all HHS data products and will eventually be linked to HealthData.gov. The NIH announcement did however specifically note the following in its “Further Steps Under Consideration” section on Data Management Plans:

“NIH is supporting an Institute of Medicine study of clinical trial data sharing… In an interim report on this topic, the IOM noted that a cultural change has occurred in discussions about clinical data sharing. Rather than exploring whether it should occur, the focus is on how it should be accomplished”

“NIH will explore the development of policies to require NIH-funded researchers to make the data underlying the conclusions of peer-reviewed scientific research publications freely available in public repositories at the time of publication in machine readable formats… NIH is taking steps to ensure all NIH-funded researchers develop data management plans…. As a first step, the 2003 NIH Data Sharing Policy will be modified to require that all NIH-funded researchers develop data management plans.”


Therefore, much of the recently released NIH response gives vague reference to what is being planned, but little detail on execution of those plans—specifically, what DMP requirements will be executed and when that execution is anticipated to occur. The NIH stance seems to be defined thusly: “NIH will determine the additional steps needed to ensure that the merits of digital data management plans are considered during the peer review process for extramural research grants and contracts” yet much is still unclear regarding what is to be expected.

Librarians working in biomedical research environments should continue to advocate that researchers write robust DMPs regardless of whether or not they are a requirement of their funders and should be sure to be aware of the following regarding NIH requirements:

NIH Data Sharing Policy 

The new sharing policy for genomic data

The separate data policies by NIH institute 

The list of the NIH’s preferred data sharing repositories 

And just for good measure here’s the NIH data sharing FAQ

Also useful is the “data sharing workbook

Librarians can also refer researchers to DMP examples in the Biology like those gathered by the New England Collaborative Data Management Curriculum

Furthermore, I encourage librarians to consider contributing the following Request for Information to help shape NIH data resources developed through the National Library of Medicine:


The National Library of Medicine needs input on the Library’s future in a Big Data world!

This is your chance to influence how some of the NIH’s most prominent data and information resources will be developed and envisioned in the future! 

Respond to the RFI at: www.nlm.gov/RFI

Deadline: 3/13

Topic: NLM seeks input regarding the strategic vision for the NLM to ensure that it remains an international leader in biomedical data and health information. In particular, comments are being sought regarding the current value of and future need for NLM programs, resources, research and training efforts and services (e.g., databases, software, collections). Your comments can include but are not limited to the following topics:

1 – Current NLM elements that are of the most, or least, value to the research community (including biomedical, clinical, behavioral, health services, public health and historical researchers) and future capabilities that will be needed to support evolving scientific and technological activities and needs.

2 – Current NLM elements that are of the most, or least, value to health professionals (e.g., those working in health care, emergency response, toxicology, environmental health and public health) and future capabilities that will be needed to enable health professionals to integrate data and knowledge from biomedical research into effective practice.

3 – Current NLM elements that are of most, or least, value to patients and the public (including students, teachers and the media) and future capabilities that will be needed to ensure a trusted source for rapid dissemination of health knowledge into the public domain.

4 – Current NLM elements that are of most, or least, value to other libraries, publishers, organizations, companies and individuals who use NLM data, software tools and systems in developing and providing value-added or complementary services and products and future capabilities that would facilitate the development of products and services that make use of NLM resources.

5 – How NLM could be better positioned to help address the broader and growing challenges associated with: Biomedical informatics, “big data” and data science; Electronic health records; Digital publications; or Other emerging challenges/elements warranting special consideration.

IDCC 15 – Part 2 (It’s a big conference)

e-Science Portal Blog - Mon, 03/02/2015 - 11:38

Last week in her blog post, Margaret discussed the twitter feed from the International Data Curation Conference (IDCC) that took place on Feb 9th to the 12th. I was fortunate enough to be able to attend and participate this year, and as it is a premier event for data professionals, I’d like to add a bit more about the conference.

The theme this year was “Ten years back, ten years forward: achievements, lessons and the future for digital curation”. Tony Hey, formerly of Microsoft Research and now a Fellow at the University of Washington, was the opening Keynote.  He did a very nice job of illustrating how far we have come in the past ten years. Data management and curation are now recognized as important issues and discussed in high-profile venues like Science and Nature.  However, he also noted that we still have some very serious problems to address. Funding for curation is often based locally, but use of digital data is global. More and more data repositories and tools are coming online, but support for these initiatives are still quite fragile and we have lost some important resources (RIP Arts & Humanities Data Service).

This tension between how far we have come vs. how far we have yet to go was echoed in a panel session titled “Why is it taking so long?” moderated by Carly Strasser from DataCite. Some of the panelists pointed to a lack of incentives, infrastructure and support as barriers to progress. However, others noted that actually quite a lot of progress had been made when one considers the scope of the changes in culture and practice that we are championing.

Presentations on Data Education struck a similar tone. Liz Lyon, from the School of Information Studies at the University of Pittsburgh, noted that roles for Data Professionals are becoming more prominent and defined, but the educational path to prepare oneself to perform these roles is still unclear. iSchools at Pitt and the University of North Carolina, whose program was described by Helen Tibbo, are seeking to position themselves as the places to fill this need.

Though awareness of curation has increased, we still have a ways to go in training academics in curation.  Research done by Daisy Abbott from the University of Glasgow demonstrated a gap between the perception among graduate students that curating their work is important with their reporting that they lack the expertise to curate their work effectively. Fortunately, we have Aleksandra Pawlik and others from the Software Sustainability Institute offering Data Carpentry workshops to help raise data literacy levels of researchers.

The program with presentation slides is available on the IDCC15 website, and the papers will soon be published in the International Journal of Digital Curation. The location of IDCC16 has yet to be announced, but I highly recommend attending if you get the chance.

IDCC15 – I Couldn’t Go But I Followed on Twitter

e-Science Portal Blog - Fri, 02/20/2015 - 10:58

I enjoy going to conferences. I love learning new things and getting new ideas.  I really love the way I’m inspired by the people I meet. But, I can’t go to every conference. Like most people, my university library budget is limited and my own budget is limited. However, as more people in libraries and data take to Twitter and other social media, I can go to conferences vicariously.

 From February 9-12 I was at the 10th International Data Curation Conference  in London, England.  While I wish I had been there, it is possible I would have been so tempted by the sights of London that I might have skipped the meeting.

There is a Storify of the conference available if you want to have a look at all the events and photos and comments. Watching the #idcc15 feed each day made me envious but also excited, as I read about the successes and new ideas that were being discussed during the various programs. Great morning coffee and lunchtime reading. A few highlights you might want to check out:

While there are differences between US and UK regulations, we can learn from programs that work at any institution. Presentations by Imperial College London, Oxford Brookes University, and University of Edinburgh are summarized here, with links to some good resources.

It is also helpful to learn from the researcher’s viewpoint. Purdue’s Data Curation Profiles were the focus of one talk that dealt with the Technology Acceptance Model. And the second talk examined if research supervisors were prepared to provide advice and guidance. Slides and papers for both talks are linked from this summary.

 The Edinburgh group, mentioned above, has a great blog and a couple of posts there talk about IDCC15 covering the first day and another post looking at how the 80/20 rule applies to RDM tools  (if you haven’t heard about the 80/20 rule, also know as the Pareto principle, check out the Wikipedia article)

 A useful Storify covers RDM training for librarians . There are slides embedded in the page, so have a look at the various curricula that were presented.

While we focus on eScience here at the portal, there are also data things going on in other subjects.  If you’ve always wanted to learn a bit about digital humanities, try this video, ”The stuff we forget: Digital Humanities, digital data, and the academic cycle” by Melissa Terras, Director of University College London Centre for Digital Humanities

 This final blog post  recommendation gives you an idea of some of the other subjects covered in the meeting with  links to the talks

By the way, I use TweetDeck to keep track of multiple things on Twitter.  There are some basic instructions here  I have the regular stream of people and organizations I follow in the first column, and after that I have columns for hashtags I’m interested in, such as the #idcc15 label for meeting tweets or #medlibs for medical librarians.  When conferences are over, and I have favorited the tweets I want to follow up on later, I can delete the column. Favorites is another column in my TweetDeck.

7 Recommended Resources for E-Science Newbies

e-Science Portal Blog - Tue, 02/17/2015 - 19:52

Submitted by Donna Kafel, Project Coordinator for the e-Science Portal and the New England e-Science Program for Librarians.

During a recent meeting of the e-Science Portal’s Editorial Board, portal editors suggested that we create a downloadable document, perhaps titled “An Introduction to e-Science” that would provide an annotated list of the best overviews and introductory resources for librarians and library students new to the concept of e-Science and library based data services.  The e-Science Portal team  thought this was a great idea and we have it on our action item list for after the portal redesign is completed this spring.

In the meantime, there are a lot of e-Science newbies out there right now who are at a loss as to where to begin, and who may like some of this information a little sooner.  Looking at all the content packed into library guides on data management, hundreds of journal articles, and data webinars can be a bit overwhelming for those just starting out. Here are seven resources that can help newbies start out on the road to figuring out what is meant by the term e-Science and  how it impacts scholarly communication, library roles in e-Science, the structure of the scientific research environment, data types and data management.

1.  The Fourth Paradigm:  Don’t be intimidated, I’m not recommending that people read the entire book in one sitting! (But it’s worth going back to read individual chapters).  The Fourth Paradigm’s Foreword and the first chapter “Jim Gray on eScience:  A Transformed Scientific Method”  nicely illustrate how the integration of computers and evolving technologies have revolutionized the way science is conducted.

2.  The e-Science Thesaurus is a great place for Newbies to learn terms and concepts, and related  references.  Included in some of the entries, are interviews with librarians who are actively engaged in e-Science (for some interesting interviews, check out Data Curation Profiles Toolkit, Implementing a Data Sharing/Management Policy  and Informationist)

3.  What is e-Science and How Should it be Managed? :  captures the essence of e-science, critical roles for librarians, and the importance of open data sharing.

4.  A nice overview of e-Science and roles for librarians:

a)      Cyberinfrastructure, Data, and Libraries, Part 1. A Cyberinfrastructure Primer for Librarians (2007) – Part one of a primer for librarians on the major issues and terminologies of e-Science.

b)      Cyberinfrastructure, Data, and Libraries, Part 2 – Part two: the role of libraries in data management and how librarians can participate in the downstream and upstream phases of the research cycle.

5.  Data Types (4 min YouTube video)—describes the diverse entities that come under the umbrella term data and the different ways data is captured.

6.  A Day in the Life of an Academic Researcher Part 1 (7 minute YouTube video) and A Day in the Life of an Academic Researcher Part 2 (5 minute YouTube video) explains the research environment and the different roles played by members of a research team.

7.  The Journal of eScience Librarianship (JeSLIB) :  specifically dedicated to the advancement of e-Science librarianship, JeSLIB  includes peer-reviewed research  and “e-Science in Action” articles on topics such as research data management, librarians embedded on research teams, data services, data curation, and data sharing and re-use.

Glitter on the Highway: Data on the Website

e-Science Portal Blog - Wed, 02/11/2015 - 12:09

By Andrew Creamer, Scientific Data Management Specialist, Brown University

Glitter on the mattress
Glitter on the highway
Glitter on the front porch
Glitter on the hallway 
Love Shack, The B-52s, Pierson, Schneider, Strickland, EMI (1989).

Recently I was reading through the drafts of the Data Management Plans (DMPs) and Broader Impacts sections that were submitted with faculty NSF proposals through our data management plan service for 2014-2015. As I reviewed these data management plans, one of the commonalities I noticed was the ubiquity of statements that data would be linked from the project website or personal website. Listed as either a tool for dissemination or post-project archiving and access, or in some cases both, there was data on the website. In a few cases data on the website was conspicuously the only option listed for dissemination or post-project archiving. Most often it was mentioned nested in among other options; for example, for dissemination, the investigators would say they would disseminate the data by sharing it on their personal or project websites, depositing it in some type of data sharing repository, and publishing the results in academic journals and presenting these at scientific meetings. As I looked over these drafts I could see where in each occurrence I had marked a comment asking the investigators for more information about what they meant by putting data on a personal or project website and to please have a conversation with me regarding this option.

The opaque “data on the website issue” comes up in almost every conversation I have had with faculty using our DMP service: “So, you say here that you have a website. How exactly are you storing and making your data available on your website? Who is responsible for doing and maintaining this, etc.” This conversation can go many ways, of course. While some faculty mean that they are depositing the data into a repository and have a persistent link that they will place on their personal or project website in a citation that will link out to the data, some faculty mean that they have a personal server, and in some alarming cases, a web server, where they will place and link to data on their website. While the former intention also leads one down a line of important questioning about suitability and sustainability, such as which repository, what kind of persistent link, etc., it is the latter scenario, of course, that concerns us research data management librarians the most.

In their article published in PLOS ONE last summer, How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers, Pepe et al. (2014) provided evidence that we can use in conversations with investigators about considering alternatives for storing data on their web or personal servers. Their findings showed putting data on a personal or project-based website was the third and fourth most popular practice for data sharing practices among astronomers after emailing or placing data on a FTP-style site. Then they looked through the external links to data published in a defined period of astronomy literature and found:

“This exploratory analysis reveals three key findings. First, since the inception of the web in the early 1990′s, astronomers have increasingly used links in articles to cite datasets and other resources which do not fit in the traditional referencing schemes for bibliographic materials. Second, as for nearly every resource on the web, availability of linked material decays with time: old links to astronomical materials are more likely to be broken than more recent ones. Third, links to “personal datasets”, i.e., links to potential data hosted on astronomers’ personal websites, become unreachable much faster than links to curated “institutional datasets”. (Pepe et. al 2014)

The practice of placing data on a website may be entrenched in the data sharing practices of certain scientific communities, but as research data management librarians we need to be sure that we do not become numb to its ubiquity; instead we must continue to question the researchers about what they mean and list ways that we can still help to make data accessible from their website but mitigate the myriad issues related to storing data on web servers or personal servers, e.g., lack of back up, persistent identifiers, no long-term preservation strategy, lack of sufficient metadata, link rot, diminished discoverability and the access risks when only one person is the sole individual responsible for making data accessible.

On the publisher side, last spring PLOS added this text to their Data Availability Policy: “Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.” This one sentence has also been helping me in the endeavor to dissuade researchers from stating that if another researcher wants or needs access to their data, then he or she can just contact them as the sole means of data access or access their data on their personal website as sole means of data dissemination. So let us hope that research funders will also begin pushing back on researchers that want to use their personal or project websites and their personal and web servers as the sole means of data dissemination or storage location for post-project access.

Citation: Pepe A, Goodman A, Muench A, Crosas M, Erdmann C (2014) How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers. PLoS ONE 9(8): e104798. doi:10.1371/journal.pone.0104798

A gentle introduction to Docker for reproducible research

e-Science Portal Blog - Thu, 01/29/2015 - 16:27

Submitted by guest contributor Stacy Konkiel, Director of Marketing & Research, Impactstory, stacy.konkiel@gmail.com.

By now, many data management librarians are familiar with the concept of reproducible research. We know why it’s important and how to (theoretically) make it happen (thorough documentation, putting data and code online, writing an excellent Methods section in a journal article, etc).

But if a scientist asked you for a single recommended reading on how to make their computational research reproducible, what would you send them?

I’d suggest “Using docker for reproducible computational publications” by Melissa Gymrek (a Bioinformatics PhD student at Harvard/MIT).

In her post, Gymrek introduces Docker, a “lightweight virtual machine” that allows a researcher to create a complete computing environment, hosted in the cloud, that other researchers can log into to reproduce results using the original researcher’s code and data.

No need to download and install R packages, or to figure out how to make someone else’s code play well with their operating system. Just install Docker, enter a simple line at the command line, and–boom–they’ve got a virtual machine running on their computer that they can log into to reproduce someone else’s findings.

Docker is already popular in the software development world, and is gaining popularity with bioinformaticists and other computational researchers. Learn more about Docker and how it can work for reproducible research on Melissa Gymrek’s blog.

Winter is the perfect time for a virtual conference or webinar!

e-Science Portal Blog - Wed, 01/28/2015 - 15:19

There’s been a flurry of upcoming virtual conferences and webinars springing up and providing educational opportunities while obviating the need for travel in  wintry weather. In a previous post, I had noted the upcoming DataONE webinar series that begins on Feb. 9th with the webinar “Open Data and Science:  Towards Optimizing the Research Process.”

NISO is sponsoring a six hour long (11 am – 5 pm EST) virtual conference on Feb. 18th:  “Scientific Data Management :  Caring for your Institution and its Intellectual Wealth. Hosted by Todd Carpenter, Executive Director of NISO, the program includes speakers from the Dept. of Energy, Emory, Tufts, Oregon State University, UIUC, the Center for Open Science, and the RMap project. The final session will be a roundtable discussion. Program topics for the conference include:

  • Data management practice meets policy
  • Uses for the data management plan
  • Building data management capacity and functionality
  • Citing and curating datasets
  • Connecting datasets with other products of scholarship
  • Changing researchers’ practices
  • Teaching data management techniques

Finally (although I suspect I’ll soon be adding to this snowballing list), Elsevier is sponsoring the webinar “Institutional & Research Repositories:  Characteristics, Relationships and Roles” on Feb. 26th from 11 am-12:15 pm (EST)


DataONE is launching a new webinar series

e-Science Portal Blog - Thu, 01/22/2015 - 09:51

DataONE is launching a new Webinar Series (www.dataone.org/webinars) focused on open science, the role of the data lifecycle, and achieving innovative science through shared data and ground-breaking tools.

The first of the series is a presentation and discussion led by Dr Jean-Claude Guédon from the Université de Montréal titled:

Open Data and Science: Towards Optimizing the Research Process”.

Tuesday February 10th 9 am Pacific / 10 am Mountain / 11am Central / 12 noon Eastern

The abstract for the talk registration details can be found at: www.dataone.org/upcoming-webinar.

Webinars will be held the 2nd Tuesday of each month at 12 noon Eastern Time.  They will be recorded and made available for viewing later the same day. A Q&A forum will also be available to attendees and later viewers alike.

More information on the DataONE Webinar Series can be found at: www.dataone.org/webinars .


R: Addressing the Intimidation Factor

e-Science Portal Blog - Wed, 01/21/2015 - 16:22

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

Working with students and researchers to help them better manage and work with their research data is a big part of the librarian’s role in a data-intensive setting. Much of the time though, the librarian needs to critically think through and advise on tools used in different parts of the data life cycle as well– this includes the data pre-processing and analysis phases of a research project. Increasingly, I am finding myself dealing with this sort of situation in my library– for example: a student comes to the library with a question about getting access to some tools for working with her data; this might mean that the student needs help restructuring some spreadsheets or other data manipulation task, but more often than not the student is also seeking statistical software and tools for data visualization. In my experience, this type of situation has been more common than requests for help on data management plans or research documentation. This type of reference interaction is also where many librarians and information professionals begin to have to have discussions about and encounters with R programming.

R is a free statistical programming language with a notorious learning curve, but students and researchers are increasingly seeing the value in tackling that curve. I was fortunate enough to take some advanced statistics courses throughout my educational career and learned R in a trial-by-fire set-up. I also co-instructed an introductory course on Computational Health Informatics this past summer wherein we taught introductory R functionalities. Therefore, when patrons come to the library looking for help getting started with R, I feel confident helping them. However, I know that when R comes up in discussions with my colleagues, they do not always feel confident assessing whether or not it is worthwhile to advise a student to learn R or just run some stats tests in Excel. My colleagues also are often intimidated when it comes to R because they are not confident that they understand how to trouble shoot and find resources for students just getting started with the program.  As a result of witnessing this type of situation on many occasions, I present here my attempt at lowering the intimidation factor surrounding R for librarians. You do not need become an R programmer to know how to approach it critically and with the ability to help others get started.

I just want to start by saying though, that I almost always encourage students to pursue learning R rather than pushing them toward Excel or a statistics program that they would need to purchased as our library does not offer regular access to stats software on our computers.  R is also much more robust for working with data than Excel. However, I realize that some students and researchers just want to get their work done and want nothing to do with learning a new programming language. At that point, I generally very briefly point R out to them anyway in case the student ever does decide that it might be useful to learn. If the student is unsure if R is what she is looking for, however, I ask the following questions:

  • Have you ever worked with data using code? (e.g. Stata, SAS)
  • Would you be willing to spend some time learning how to use a new tool?
  • Are the statistical tests you need to run somewhat complex?
  • Will you need to repeat the steps for how you cleaned up your data?
  • Will you need to repeat the steps for how you analyzed your data?
  • Will visualizing your data be very important to you on this project?
  • Is your data in more than one format?

If the student answer “yes” to a few of these questions, I would strongly encourage them to use R rather than a tool like Excel.  Check out Chris Leonard’s discussion on the R Blog for more information on the Excel vs. R question. And with the below resources and jargon under your belt, you will feel more comfortable approaching R programming if you and your patron do decide that R is a good choice.

One of the first resources I usually point new users to, is Quick R. Quick R provides new learners and experienced users alike with “a roadmap and the code necessary to get started quickly, and orient yourself for future learning” with R. I encourage librarians and patrons to look through the “Data Types” section of Quick R if you are unfamiliar with the concept of data types as understanding how R users talk about data will get you feeling less intimidated by unfamiliar terms right off the bat.

There is some other basic jargon you should be aware of when talking about R with patrons as well. For instance, if you are using R, you will likely need to use R packages. “Packages are collections of R functions, data, and compiled code in a well-defined format” (Quick R). The place where packages are stored is called the library. R comes with a standard set of packages when you install it, but others are available for download and installation. You can install packages by running the following command with the name of the package you need:

> install.packages(“name_of_package”)

Once installed, packages need to be loaded into the session to be used. This can be done using the command:

> library(name_of_package)

There are also buttons on interfaces like R Studio that can help you install and load packages without needing to write commands.

R function is another term you’ll likely hear if you get questions about R. R functions allow you to write commands and store them in easy to read and implement text. For example, this is how one could write a function that subtracts one number from another number. The function in the example is called f1:

> f1 <- function(x,y) {x-y}

There are entire packages of functions written by others to help users accomplish complicated tasks. For example, if a researcher decides she needs run some regression diagnostics, there are pre-written functions to accomplish this task in the package called “car“. When the researcher installs the car package and loads it from her library, she will be able to access to functions to run her diagnostics.  You can view an example of this and many other statistical analysis functions using Quick R.

I also tend to point researchers and my colleagues toward general reference material if they are looking for more granular help getting started with R programming. The following have been very useful in the past:

Additionally, the following tutorial resources are usually very well received:

And one should never neglect the help available through the R Community:

The resources noted above, and many others are listed in a resource guide that I developed on R and Data Mining, which can be found here.

In summary, you do not need to read all of these resources on R to help others work with it. By going through some of the above material and familiarizing yourself with the terminology and resources associated with R, you will be well equipped to help with common R problems. R is challenging, but like all new things, exposure is the only way to get used to it. Start small with terminology and basic documentation– in this way you will gain the confidence and knowledge necessary to begin working on reference transactions that involve R programming.


A Model of Collaborative Education Efforts in Data Management: the Virginia Data Management Boot Camp

e-Science Portal Blog - Thu, 01/15/2015 - 12:42

Submitted by guest contributor Yasmeen Shorish, Physical & Life Sciences Librarian at James Madison University.


Question: How do you deliver the same data management training to graduate students, faculty, and staff simultaneously? How do you deliver that content not just at your own institution, but also to six other institutions across the state?

Answer: Very carefully, with a lot of cooperation, collaboration, and some technical wizardry thrown in as well. This is the story of seven Virginia institutions who stopped repeating content individually and started getting real – real collaborative.

In January 2013, the libraries at the University of Virginia (UVA) and Virginia Tech (VT) teamed up to produce a “Data Management Bootcamp” for graduate students on their campuses. Utilizing telepresence technology, speakers could interact with participants at either school in large, virtual sessions as opposed to discreet events at each venue. Librarian interest in this event resulted in the addition of three additional institutions in 2014: James Madison University (JMU), George Mason University (GMU), and Old Dominion University (ODU). UVA, VT, JMU, and GMU have an existing telepresence set-up called 4-VA and it was not difficult, technology-wise, to add ODU in to participate fully as well. Librarians from these five institutions, including myself, formed a planning group to produce the “2014 Virginia Data Management Bootcamp.”

However, expanding a program from two locations to five locations does present some complications. Can everyone connect simultaneously? Do the screens get too cluttered when everyone is connected? How do we decide what content is most appropriate for five very different institutions? The 2014 Bootcamp began planning in the summer of 2013. A series of virtual meetings among the planning group resulted in an agenda that included understanding research data, operational data management, data documentation and metadata, file formats and transformations, storage and security, DMPTool and funding agencies, rights and licensing, protection and privacy, and preservation and sharing. It was a lot to cover in two full days, with a third half-day for local discussion. The full agenda can be found on this LibGuide.

The group debriefed after the 2014 event and discussed what 2015 should look like. We knew that the next event should be less dense, as that much content in two days was somewhat overwhelming.  The College of William & Mary (WM) and Virginia Commonwealth University (VCU) both expressed a desire to participate. With some technological work involving bridges, WebEx, and patience, the Virginia Data Management Bootcamp was able to expand to include these universities. Happily, increasing the number of participating institutions did not increase the complexity very much. One change that may have had the most impact was that the planning group decided to add more in-person meetings to work through curriculum ideas. We found that as a group, we could accomplish more in a shorter amount of time when we were gathered around one table, discussing ideas.

Using pre and post assessment surveys helped us zero in on some areas for change. We wanted to build more interactivity and limit the amount of lecture for each area.  We also wanted to engage the audience in the research cycle more intentionally than we had been. We redesigned the three-day event in smaller chunks, with more local discussion and more hands-on activities. A full schedule can be found on this LibGuide.

Can other states or groups of libraries produce a cross-institutional data management outreach program? Yes!

What if they lack a fancy telepresence room? Still, yes! There are viable alternatives that may have a different look and feel, but can still accomplish the same goal.

Want to launch a cross-institutional program of your own?

The best way to get started is to first get a sense of who would want to participate. Propose the workshop and form a planning group. The number of participating venues will shape what technology you use to bring it all together. WebEx may be appropriate, or even a Google Hangout (although image quality could be concern).

How much time can you set aside for the workshop? One day? Three days? That will determine what gets covered and how. The more hands-on engagement that you can work into the program, the more likely you are to keep interest across sites.

Determine a meeting schedule for the planning group and decide which meeting method (virtual vs. in person) will be more effective. Individually, each site will need to coordinate with its own campus partners to make it as big an event as they wish. Assessment of some kind is necessary to determine what could change if you do it all over again.

Collaborative education efforts such as these can help institutions leverage the expertise that is naturally distributed. Setting a foundational learning outcome for data management is an achievable goal and a good way to build a community of practice in your local region.





Just published: Journal of eScience Librarianship special issue on data literacy

e-Science Portal Blog - Mon, 01/12/2015 - 14:34

The latest issue of the Journal of eScience Librarianship (JESLIB) has just been published! It is available at http://escholarship.umassmed.edu/jeslib/vol3/iss1/

 Table of Contents

Volume 3, Issue 1 (2014)


What is Data Literacy?
Elaine R. Martin

Full-Length Papers

Planning Data Management Education Initiatives: Process, Feedback, and Future Directions
Christopher Eaker

A Spider, an Octopus, or an Animal Just Coming into Existence? Designing a Curriculum for Librarians to Support Research Data Management
Andrew M. Cox, Eddy Verbaan, and Barbara Sen

An Analysis of Data Management Plans in University of Illinois National Science Foundation Grant Proposals
William H. Mischo, Mary C. Schlembach, and Megan N. O’Donnell

Initiating Data Management Instruction to Graduate Students at the University of Houston Using the New England Collaborative Data Management Curriculum
Christie Peters and Porcia Vaughn

EScience in Action

Research Data MANTRA: A Labour of Love
Robin Rice

Building Data Services From the Ground Up: Strategies and Resources
Heather L. Coates

Building the New England Collaborative Data Management Curriculum
Donna Kafel, Andrew T. Creamer, and Elaine R. Martin

Lessons Learned From a Research Data Management Pilot Course at an Academic Library
Jennifer Muilenburg, Mahria Lebow, and Joanne Rich

Gaining Traction in Research Data Management Support: A Case Study
Donna L. O’Malley

The New England Collaborative Data Management Curriculum Pilot at the University of Manitoba: A Canadian Experience
Mayu Ishida

Are you interested in submitting to JESLIB? Please refer to author guidelines at http://escholarship.umassmed.edu/jeslib/styleguide.html

Share your projects at the NE e-Science Symposium’s Poster Session

e-Science Portal Blog - Mon, 01/05/2015 - 17:40

One of the most popular attractions of the annual University of Massachusetts and New England Librarian e-Science Symposium is its poster session. The symposium poster session offers an ideal venue for librarians and library school students who are involved in e-Science and RDM projects and/or research to share their findings and exchange ideas with interested colleagues.  The poster session also includes a  contest, in which judges review the posters to determine the best in these three categories:  Most Informative in Communicating e-Science Librarianship, Best Example of e-Science in Action, and Best Poster Overall.

Interested? If you haven’t yet registered for the e-Science Symposium, make that your first step, as registration is filling quickly. Then, write your poster proposal and submit it following these instructions by the proposal deadline of Feb. 6.

Want to see some examples? The e-Science Symposium conference site features archived posters from past symposia. For links to the past six symposia, visit the e-Science Symposium conference page.

Got questions?  For further details, or questions regarding the poster contest, please contact Raquel Abad at raquel.abad@umassmed.edu



Two new articles featured in the Journal of eScience Librarianship

e-Science Portal Blog - Fri, 12/19/2014 - 12:29

The Journal of eScience Librarianship (JeSLIB) has just published the following two articles:

These two articles are part of Volume 3, Issue 1 of JeSLIB that will be published in January. An announcement will be made when the issue is published.



Registration now open for 7th annual New England e-Science Symposium

e-Science Portal Blog - Wed, 12/10/2014 - 17:01

Registration is now open for the 7th annual University of Massachusetts and New England Area Librarian e-Science Symposium, to be held on Thursday, April 9, 2014. For details and to register, visit the 2015 e-Science Symposium conference site.  Registration is on a first come, first serve basis and will be capped at 90 people.

Librarians: the original research data managers

e-Science Portal Blog - Wed, 12/10/2014 - 15:45

Submitted by guest contributor Nancy Glassman, Assistant Director for Informatics, D. Samuel Gottesman Library, Albert Einstein College of Medicine

In conjunction with Albert Einstein College of Medicine’s Faculty Development Program I lead an introduction to research data management workshop. Attendees usually include a mix of clinical and basic science faculty, as well as a few postdocs and graduate students. To set the stage at a recent workshop, I asked the group if they were surprised to have a librarian as the instructor. Taken aback by nodding heads around the table, I quickly recovered my composure and decided to make the most of this “teachable moment.”

All of the workshop’s attendees use the library’s resources and services, but as long as things are running smoothly and they find the information they need, they don’t really need to think about how it was made available to them. Many library users are unaware of what librarians actually do, and that’s just fine. But it’s worthwhile to take a few minutes to show researchers how the traditional library services they use almost every day require similar, if not the same, skill set as managing research data.

Librarians are, arguably, the original data managers. Think about it. Librarians have been managing data and information in one form or another for thousands of years, practically since the dawn of the written word. Archaeologists in Turkey have found collections of stone tablets dating back to the 17th-13th centuries BCE containing early forms of metadata.(1) These examples describe metadata concepts such as attribution and versioning:

“Written by the Hand of Lu, son of Nuggassar, in the presence of Anuwanza …”

“This tablet was damaged. In the presence of Mahhuziand Halwalu, I, Duda, restored it…” (1)

Fast forward to the library of the twenty-first century. We work and live in the era of big data in which “everything is available for free on the Internet.” Who makes sense of this information overload? Who selects, catalogs, curates, backs up, makes available relevant sources of information?  Who helps users cite these resources properly? Who safeguards patron information?

  •  Librarians are expert at making data meaningful and easily discoverable.  Look no further than the library’s catalog, a classic example of metadata in action. In medical libraries MeSH (Medical Subject Headings) is used to categorize material by subject.  Author names and titles are standardized.  Call numbers make it easy to find items on bookshelves.
  •  Although librarians are not copyright lawyers, we do have a lot of practice navigating copyright, licensing agreements, and open access as part of our regular activities. This includes negotiating with vendors, managing interlibrary loan, as well public- and open-access initiatives (including the NIH Public Access Policy).
  • Researchers rely on librarians for help in finding relevant, evidence-based information. In addition to being experienced searchers of online databases such as PubMed, Embase, and Web of Science, we also mine the “deep web” to find those elusive resources.
  •  Librarians are familiar with the rules and nuances of proper citation and attribution practices.  We support many citation management programs, including EndNote, RefWorks, and Mendeley, and teach students on how to cite correctly and avoid plagiarism.
  •  Data comes in a lot of different packages, and long term preservation and data storage are important aspects of managing research data.  Over the millennia we have maintained and preserved collections of tablets, scrolls, manuscripts, maps, audiovisual materials, print books and journals, e-books, e-journals, websites, blogs, wikis, and data sets.

Although the media and the volume of data have changed radically over time, the expertise to manage all of this remains essentially the same. Librarians are particularly adept at adapting to change. Helping researchers manage their data is a logical extension of a long-standing tradition.

After the workshop, one attendee approached me, and acknowledged that at first he was skeptical about taking a class on research data management led by a librarian, but after I described the ways traditional librarian skills apply, it all made sense.  Conversations like this can open users’ eyes to librarians’ wide range of information management skills and may lead to new and interesting partnerships.


1.  Casson L. Libraries in the ancient world. New Haven: Yale University Press; 2001. xii, 177 p. p 13.

ACRL Digital Curation Interest Group Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 13:02

The following announcement is posted on behalf of the ACRL Digital Curation Interest Group Team.

The ACRL Digital Curation Interest Group is looking for proposals for our Spring Webinars and for ALA Annual 2015.   The group would like to host three webinars in the Spring and have 3-4 panelists for ALA Annual 2015.  So please consider submitting a short abstract proposal!

CFP for our Spring webinars:

We invite proposals on topics germane to digital curation activities including (but not limited to) the following topics:

  • Documentation and organization
  • Digital preservation
  • Digital curation software and tools
  • Metadata specialists
  • Non-institutional repositories
  • Skills needed/Skills learned to tackle digital curation
  • Specific data management procedures such as file naming
  • Data purchased from vendors
  • Careers in digital curation
  • Digital curation lifecycle

We seek webinars of 60 minutes in length (including time for questions).  If you have an idea for webinar please send a short description of it to Megan Toups at mtoups@trinity.edu by January 31, 2015.

CFP for ALA Annual 2015:
We are putting together a panel of 3-4 people to present for ~10 minutes each covering digital curation from a variety of perspectives.  Panelists will present and then engage the audience in a productive conversation on digital curation.

We’d love to have a diverse set of panelists representing a variety of different digital curation perspectives–research data, archives and digital curation, theory, practice, etc.  Want to be a part of this interesting panel?  Please submit a short description of what you’d like to present to Megan Toups at mtoups@trinity.edu by January 31, 2015.

Thank you for your submissions!

The DCIG Team–Megan Toups, Suzanna Conrad, Rene Tanner

RDAP15 Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 12:12

RDAP15, the sixth annual Research Data Access and Preservation Summit, is accepting proposals (max. 300 words) for panels, interactive posters, lightning talks, and discussion tables. Themes for RDAP15 were selected by this year’s planning committee with input from previous years’ attendees and RDAP community members.

These are the proposal deadlines for the 2015 RDAP Summit:

December 19, 2014: Panel Presentations Submissions Due
January 16, 2015: Interactive Posters and Lightning Talks Submissions Due

For further details see RDAP15′s Call for Proposals webpage.

Metadata Services for Research Data Management Call for Presentations

e-Science Portal Blog - Tue, 12/02/2014 - 12:11

The ALCTS interest group of ALA has issued a Call for Presentations for the program “Metadata Services for Research Data Management” that will be held during the ALCTS Virtual Preconference “Planning for the Evolving Role of Metadata Librarians”, that will be held prior to the ALA annual meeting in June 2015 in San Francisco. Deadline for proposals is this Friday, Dec. 5th.  See full announcement on Metadata Interest Group blog .


Evolving Scholarly Record and the Evolving Stewardship Ecosystem – Workshop Series

e-Science Portal Blog - Mon, 12/01/2014 - 09:58

OCLC is sponsoring a series of workshops that build upon the framework presented in its recent research report The Evolving Scholarly Record. Workshops will be held in Washington, DC, Chicago, San Francisco, and Amsterdam. Seating is limited so you are encouraged to register now. See announcement for further details.


Syndicate content