Feed aggregator

A gentle introduction to Docker for reproducible research

e-Science Portal Blog - Thu, 01/29/2015 - 16:27

Submitted by guest contributor Stacy Konkiel, Director of Marketing & Research, Impactstory, stacy.konkiel@gmail.com.

By now, many data management librarians are familiar with the concept of reproducible research. We know why it’s important and how to (theoretically) make it happen (thorough documentation, putting data and code online, writing an excellent Methods section in a journal article, etc).

But if a scientist asked you for a single recommended reading on how to make their computational research reproducible, what would you send them?

I’d suggest “Using docker for reproducible computational publications” by Melissa Gymrek (a Bioinformatics PhD student at Harvard/MIT).

In her post, Gymrek introduces Docker, a “lightweight virtual machine” that allows a researcher to create a complete computing environment, hosted in the cloud, that other researchers can log into to reproduce results using the original researcher’s code and data.

No need to download and install R packages, or to figure out how to make someone else’s code play well with their operating system. Just install Docker, enter a simple line at the command line, and–boom–they’ve got a virtual machine running on their computer that they can log into to reproduce someone else’s findings.

Docker is already popular in the software development world, and is gaining popularity with bioinformaticists and other computational researchers. Learn more about Docker and how it can work for reproducible research on Melissa Gymrek’s blog.

Winter is the perfect time for a virtual conference or webinar!

e-Science Portal Blog - Wed, 01/28/2015 - 15:19

There’s been a flurry of upcoming virtual conferences and webinars springing up and providing educational opportunities while obviating the need for travel in  wintry weather. In a previous post, I had noted the upcoming DataONE webinar series that begins on Feb. 9th with the webinar “Open Data and Science:  Towards Optimizing the Research Process.”

NISO is sponsoring a six hour long (11 am – 5 pm EST) virtual conference on Feb. 18th:  “Scientific Data Management :  Caring for your Institution and its Intellectual Wealth. Hosted by Todd Carpenter, Executive Director of NISO, the program includes speakers from the Dept. of Energy, Emory, Tufts, Oregon State University, UIUC, the Center for Open Science, and the RMap project. The final session will be a roundtable discussion. Program topics for the conference include:

  • Data management practice meets policy
  • Uses for the data management plan
  • Building data management capacity and functionality
  • Citing and curating datasets
  • Connecting datasets with other products of scholarship
  • Changing researchers’ practices
  • Teaching data management techniques

Finally (although I suspect I’ll soon be adding to this snowballing list), Elsevier is sponsoring the webinar “Institutional & Research Repositories:  Characteristics, Relationships and Roles” on Feb. 26th from 11 am-12:15 pm (EST)


DataONE is launching a new webinar series

e-Science Portal Blog - Thu, 01/22/2015 - 09:51

DataONE is launching a new Webinar Series (www.dataone.org/webinars) focused on open science, the role of the data lifecycle, and achieving innovative science through shared data and ground-breaking tools.

The first of the series is a presentation and discussion led by Dr Jean-Claude Guédon from the Université de Montréal titled:

Open Data and Science: Towards Optimizing the Research Process”.

Tuesday February 10th 9 am Pacific / 10 am Mountain / 11am Central / 12 noon Eastern

The abstract for the talk registration details can be found at: www.dataone.org/upcoming-webinar.

Webinars will be held the 2nd Tuesday of each month at 12 noon Eastern Time.  They will be recorded and made available for viewing later the same day. A Q&A forum will also be available to attendees and later viewers alike.

More information on the DataONE Webinar Series can be found at: www.dataone.org/webinars .


R: Addressing the Intimidation Factor

e-Science Portal Blog - Wed, 01/21/2015 - 16:22

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

Working with students and researchers to help them better manage and work with their research data is a big part of the librarian’s role in a data-intensive setting. Much of the time though, the librarian needs to critically think through and advise on tools used in different parts of the data life cycle as well– this includes the data pre-processing and analysis phases of a research project. Increasingly, I am finding myself dealing with this sort of situation in my library– for example: a student comes to the library with a question about getting access to some tools for working with her data; this might mean that the student needs help restructuring some spreadsheets or other data manipulation task, but more often than not the student is also seeking statistical software and tools for data visualization. In my experience, this type of situation has been more common than requests for help on data management plans or research documentation. This type of reference interaction is also where many librarians and information professionals begin to have to have discussions about and encounters with R programming.

R is a free statistical programming language with a notorious learning curve, but students and researchers are increasingly seeing the value in tackling that curve. I was fortunate enough to take some advanced statistics courses throughout my educational career and learned R in a trial-by-fire set-up. I also co-instructed an introductory course on Computational Health Informatics this past summer wherein we taught introductory R functionalities. Therefore, when patrons come to the library looking for help getting started with R, I feel confident helping them. However, I know that when R comes up in discussions with my colleagues, they do not always feel confident assessing whether or not it is worthwhile to advise a student to learn R or just run some stats tests in Excel. My colleagues also are often intimidated when it comes to R because they are not confident that they understand how to trouble shoot and find resources for students just getting started with the program.  As a result of witnessing this type of situation on many occasions, I present here my attempt at lowering the intimidation factor surrounding R for librarians. You do not need become an R programmer to know how to approach it critically and with the ability to help others get started.

I just want to start by saying though, that I almost always encourage students to pursue learning R rather than pushing them toward Excel or a statistics program that they would need to purchased as our library does not offer regular access to stats software on our computers.  R is also much more robust for working with data than Excel. However, I realize that some students and researchers just want to get their work done and want nothing to do with learning a new programming language. At that point, I generally very briefly point R out to them anyway in case the student ever does decide that it might be useful to learn. If the student is unsure if R is what she is looking for, however, I ask the following questions:

  • Have you ever worked with data using code? (e.g. Stata, SAS)
  • Would you be willing to spend some time learning how to use a new tool?
  • Are the statistical tests you need to run somewhat complex?
  • Will you need to repeat the steps for how you cleaned up your data?
  • Will you need to repeat the steps for how you analyzed your data?
  • Will visualizing your data be very important to you on this project?
  • Is your data in more than one format?

If the student answer “yes” to a few of these questions, I would strongly encourage them to use R rather than a tool like Excel.  Check out Chris Leonard’s discussion on the R Blog for more information on the Excel vs. R question. And with the below resources and jargon under your belt, you will feel more comfortable approaching R programming if you and your patron do decide that R is a good choice.

One of the first resources I usually point new users to, is Quick R. Quick R provides new learners and experienced users alike with “a roadmap and the code necessary to get started quickly, and orient yourself for future learning” with R. I encourage librarians and patrons to look through the “Data Types” section of Quick R if you are unfamiliar with the concept of data types as understanding how R users talk about data will get you feeling less intimidated by unfamiliar terms right off the bat.

There is some other basic jargon you should be aware of when talking about R with patrons as well. For instance, if you are using R, you will likely need to use R packages. “Packages are collections of R functions, data, and compiled code in a well-defined format” (Quick R). The place where packages are stored is called the library. R comes with a standard set of packages when you install it, but others are available for download and installation. You can install packages by running the following command with the name of the package you need:

> install.packages(“name_of_package”)

Once installed, packages need to be loaded into the session to be used. This can be done using the command:

> library(name_of_package)

There are also buttons on interfaces like R Studio that can help you install and load packages without needing to write commands.

R function is another term you’ll likely hear if you get questions about R. R functions allow you to write commands and store them in easy to read and implement text. For example, this is how one could write a function that subtracts one number from another number. The function in the example is called f1:

> f1 <- function(x,y) {x-y}

There are entire packages of functions written by others to help users accomplish complicated tasks. For example, if a researcher decides she needs run some regression diagnostics, there are pre-written functions to accomplish this task in the package called “car“. When the researcher installs the car package and loads it from her library, she will be able to access to functions to run her diagnostics.  You can view an example of this and many other statistical analysis functions using Quick R.

I also tend to point researchers and my colleagues toward general reference material if they are looking for more granular help getting started with R programming. The following have been very useful in the past:

Additionally, the following tutorial resources are usually very well received:

And one should never neglect the help available through the R Community:

The resources noted above, and many others are listed in a resource guide that I developed on R and Data Mining, which can be found here.

In summary, you do not need to read all of these resources on R to help others work with it. By going through some of the above material and familiarizing yourself with the terminology and resources associated with R, you will be well equipped to help with common R problems. R is challenging, but like all new things, exposure is the only way to get used to it. Start small with terminology and basic documentation– in this way you will gain the confidence and knowledge necessary to begin working on reference transactions that involve R programming.


A Model of Collaborative Education Efforts in Data Management: the Virginia Data Management Boot Camp

e-Science Portal Blog - Thu, 01/15/2015 - 12:42

Submitted by guest contributor Yasmeen Shorish, Physical & Life Sciences Librarian at James Madison University.


Question: How do you deliver the same data management training to graduate students, faculty, and staff simultaneously? How do you deliver that content not just at your own institution, but also to six other institutions across the state?

Answer: Very carefully, with a lot of cooperation, collaboration, and some technical wizardry thrown in as well. This is the story of seven Virginia institutions who stopped repeating content individually and started getting real – real collaborative.

In January 2013, the libraries at the University of Virginia (UVA) and Virginia Tech (VT) teamed up to produce a “Data Management Bootcamp” for graduate students on their campuses. Utilizing telepresence technology, speakers could interact with participants at either school in large, virtual sessions as opposed to discreet events at each venue. Librarian interest in this event resulted in the addition of three additional institutions in 2014: James Madison University (JMU), George Mason University (GMU), and Old Dominion University (ODU). UVA, VT, JMU, and GMU have an existing telepresence set-up called 4-VA and it was not difficult, technology-wise, to add ODU in to participate fully as well. Librarians from these five institutions, including myself, formed a planning group to produce the “2014 Virginia Data Management Bootcamp.”

However, expanding a program from two locations to five locations does present some complications. Can everyone connect simultaneously? Do the screens get too cluttered when everyone is connected? How do we decide what content is most appropriate for five very different institutions? The 2014 Bootcamp began planning in the summer of 2013. A series of virtual meetings among the planning group resulted in an agenda that included understanding research data, operational data management, data documentation and metadata, file formats and transformations, storage and security, DMPTool and funding agencies, rights and licensing, protection and privacy, and preservation and sharing. It was a lot to cover in two full days, with a third half-day for local discussion. The full agenda can be found on this LibGuide.

The group debriefed after the 2014 event and discussed what 2015 should look like. We knew that the next event should be less dense, as that much content in two days was somewhat overwhelming.  The College of William & Mary (WM) and Virginia Commonwealth University (VCU) both expressed a desire to participate. With some technological work involving bridges, WebEx, and patience, the Virginia Data Management Bootcamp was able to expand to include these universities. Happily, increasing the number of participating institutions did not increase the complexity very much. One change that may have had the most impact was that the planning group decided to add more in-person meetings to work through curriculum ideas. We found that as a group, we could accomplish more in a shorter amount of time when we were gathered around one table, discussing ideas.

Using pre and post assessment surveys helped us zero in on some areas for change. We wanted to build more interactivity and limit the amount of lecture for each area.  We also wanted to engage the audience in the research cycle more intentionally than we had been. We redesigned the three-day event in smaller chunks, with more local discussion and more hands-on activities. A full schedule can be found on this LibGuide.

Can other states or groups of libraries produce a cross-institutional data management outreach program? Yes!

What if they lack a fancy telepresence room? Still, yes! There are viable alternatives that may have a different look and feel, but can still accomplish the same goal.

Want to launch a cross-institutional program of your own?

The best way to get started is to first get a sense of who would want to participate. Propose the workshop and form a planning group. The number of participating venues will shape what technology you use to bring it all together. WebEx may be appropriate, or even a Google Hangout (although image quality could be concern).

How much time can you set aside for the workshop? One day? Three days? That will determine what gets covered and how. The more hands-on engagement that you can work into the program, the more likely you are to keep interest across sites.

Determine a meeting schedule for the planning group and decide which meeting method (virtual vs. in person) will be more effective. Individually, each site will need to coordinate with its own campus partners to make it as big an event as they wish. Assessment of some kind is necessary to determine what could change if you do it all over again.

Collaborative education efforts such as these can help institutions leverage the expertise that is naturally distributed. Setting a foundational learning outcome for data management is an achievable goal and a good way to build a community of practice in your local region.





Just published: Journal of eScience Librarianship special issue on data literacy

e-Science Portal Blog - Mon, 01/12/2015 - 14:34

The latest issue of the Journal of eScience Librarianship (JESLIB) has just been published! It is available at http://escholarship.umassmed.edu/jeslib/vol3/iss1/

 Table of Contents

Volume 3, Issue 1 (2014)


What is Data Literacy?
Elaine R. Martin

Full-Length Papers

Planning Data Management Education Initiatives: Process, Feedback, and Future Directions
Christopher Eaker

A Spider, an Octopus, or an Animal Just Coming into Existence? Designing a Curriculum for Librarians to Support Research Data Management
Andrew M. Cox, Eddy Verbaan, and Barbara Sen

An Analysis of Data Management Plans in University of Illinois National Science Foundation Grant Proposals
William H. Mischo, Mary C. Schlembach, and Megan N. O’Donnell

Initiating Data Management Instruction to Graduate Students at the University of Houston Using the New England Collaborative Data Management Curriculum
Christie Peters and Porcia Vaughn

EScience in Action

Research Data MANTRA: A Labour of Love
Robin Rice

Building Data Services From the Ground Up: Strategies and Resources
Heather L. Coates

Building the New England Collaborative Data Management Curriculum
Donna Kafel, Andrew T. Creamer, and Elaine R. Martin

Lessons Learned From a Research Data Management Pilot Course at an Academic Library
Jennifer Muilenburg, Mahria Lebow, and Joanne Rich

Gaining Traction in Research Data Management Support: A Case Study
Donna L. O’Malley

The New England Collaborative Data Management Curriculum Pilot at the University of Manitoba: A Canadian Experience
Mayu Ishida

Are you interested in submitting to JESLIB? Please refer to author guidelines at http://escholarship.umassmed.edu/jeslib/styleguide.html

Share your projects at the NE e-Science Symposium’s Poster Session

e-Science Portal Blog - Mon, 01/05/2015 - 17:40

One of the most popular attractions of the annual University of Massachusetts and New England Librarian e-Science Symposium is its poster session. The symposium poster session offers an ideal venue for librarians and library school students who are involved in e-Science and RDM projects and/or research to share their findings and exchange ideas with interested colleagues.  The poster session also includes a  contest, in which judges review the posters to determine the best in these three categories:  Most Informative in Communicating e-Science Librarianship, Best Example of e-Science in Action, and Best Poster Overall.

Interested? If you haven’t yet registered for the e-Science Symposium, make that your first step, as registration is filling quickly. Then, write your poster proposal and submit it following these instructions by the proposal deadline of Feb. 6.

Want to see some examples? The e-Science Symposium conference site features archived posters from past symposia. For links to the past six symposia, visit the e-Science Symposium conference page.

Got questions?  For further details, or questions regarding the poster contest, please contact Raquel Abad at raquel.abad@umassmed.edu



Two new articles featured in the Journal of eScience Librarianship

e-Science Portal Blog - Fri, 12/19/2014 - 12:29

The Journal of eScience Librarianship (JeSLIB) has just published the following two articles:

These two articles are part of Volume 3, Issue 1 of JeSLIB that will be published in January. An announcement will be made when the issue is published.



Registration now open for 7th annual New England e-Science Symposium

e-Science Portal Blog - Wed, 12/10/2014 - 17:01

Registration is now open for the 7th annual University of Massachusetts and New England Area Librarian e-Science Symposium, to be held on Thursday, April 9, 2014. For details and to register, visit the 2015 e-Science Symposium conference site.  Registration is on a first come, first serve basis and will be capped at 90 people.

Librarians: the original research data managers

e-Science Portal Blog - Wed, 12/10/2014 - 15:45

Submitted by guest contributor Nancy Glassman, Assistant Director for Informatics, D. Samuel Gottesman Library, Albert Einstein College of Medicine

In conjunction with Albert Einstein College of Medicine’s Faculty Development Program I lead an introduction to research data management workshop. Attendees usually include a mix of clinical and basic science faculty, as well as a few postdocs and graduate students. To set the stage at a recent workshop, I asked the group if they were surprised to have a librarian as the instructor. Taken aback by nodding heads around the table, I quickly recovered my composure and decided to make the most of this “teachable moment.”

All of the workshop’s attendees use the library’s resources and services, but as long as things are running smoothly and they find the information they need, they don’t really need to think about how it was made available to them. Many library users are unaware of what librarians actually do, and that’s just fine. But it’s worthwhile to take a few minutes to show researchers how the traditional library services they use almost every day require similar, if not the same, skill set as managing research data.

Librarians are, arguably, the original data managers. Think about it. Librarians have been managing data and information in one form or another for thousands of years, practically since the dawn of the written word. Archaeologists in Turkey have found collections of stone tablets dating back to the 17th-13th centuries BCE containing early forms of metadata.(1) These examples describe metadata concepts such as attribution and versioning:

“Written by the Hand of Lu, son of Nuggassar, in the presence of Anuwanza …”

“This tablet was damaged. In the presence of Mahhuziand Halwalu, I, Duda, restored it…” (1)

Fast forward to the library of the twenty-first century. We work and live in the era of big data in which “everything is available for free on the Internet.” Who makes sense of this information overload? Who selects, catalogs, curates, backs up, makes available relevant sources of information?  Who helps users cite these resources properly? Who safeguards patron information?

  •  Librarians are expert at making data meaningful and easily discoverable.  Look no further than the library’s catalog, a classic example of metadata in action. In medical libraries MeSH (Medical Subject Headings) is used to categorize material by subject.  Author names and titles are standardized.  Call numbers make it easy to find items on bookshelves.
  •  Although librarians are not copyright lawyers, we do have a lot of practice navigating copyright, licensing agreements, and open access as part of our regular activities. This includes negotiating with vendors, managing interlibrary loan, as well public- and open-access initiatives (including the NIH Public Access Policy).
  • Researchers rely on librarians for help in finding relevant, evidence-based information. In addition to being experienced searchers of online databases such as PubMed, Embase, and Web of Science, we also mine the “deep web” to find those elusive resources.
  •  Librarians are familiar with the rules and nuances of proper citation and attribution practices.  We support many citation management programs, including EndNote, RefWorks, and Mendeley, and teach students on how to cite correctly and avoid plagiarism.
  •  Data comes in a lot of different packages, and long term preservation and data storage are important aspects of managing research data.  Over the millennia we have maintained and preserved collections of tablets, scrolls, manuscripts, maps, audiovisual materials, print books and journals, e-books, e-journals, websites, blogs, wikis, and data sets.

Although the media and the volume of data have changed radically over time, the expertise to manage all of this remains essentially the same. Librarians are particularly adept at adapting to change. Helping researchers manage their data is a logical extension of a long-standing tradition.

After the workshop, one attendee approached me, and acknowledged that at first he was skeptical about taking a class on research data management led by a librarian, but after I described the ways traditional librarian skills apply, it all made sense.  Conversations like this can open users’ eyes to librarians’ wide range of information management skills and may lead to new and interesting partnerships.


1.  Casson L. Libraries in the ancient world. New Haven: Yale University Press; 2001. xii, 177 p. p 13.

ACRL Digital Curation Interest Group Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 13:02

The following announcement is posted on behalf of the ACRL Digital Curation Interest Group Team.

The ACRL Digital Curation Interest Group is looking for proposals for our Spring Webinars and for ALA Annual 2015.   The group would like to host three webinars in the Spring and have 3-4 panelists for ALA Annual 2015.  So please consider submitting a short abstract proposal!

CFP for our Spring webinars:

We invite proposals on topics germane to digital curation activities including (but not limited to) the following topics:

  • Documentation and organization
  • Digital preservation
  • Digital curation software and tools
  • Metadata specialists
  • Non-institutional repositories
  • Skills needed/Skills learned to tackle digital curation
  • Specific data management procedures such as file naming
  • Data purchased from vendors
  • Careers in digital curation
  • Digital curation lifecycle

We seek webinars of 60 minutes in length (including time for questions).  If you have an idea for webinar please send a short description of it to Megan Toups at mtoups@trinity.edu by January 31, 2015.

CFP for ALA Annual 2015:
We are putting together a panel of 3-4 people to present for ~10 minutes each covering digital curation from a variety of perspectives.  Panelists will present and then engage the audience in a productive conversation on digital curation.

We’d love to have a diverse set of panelists representing a variety of different digital curation perspectives–research data, archives and digital curation, theory, practice, etc.  Want to be a part of this interesting panel?  Please submit a short description of what you’d like to present to Megan Toups at mtoups@trinity.edu by January 31, 2015.

Thank you for your submissions!

The DCIG Team–Megan Toups, Suzanna Conrad, Rene Tanner

RDAP15 Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 12:12

RDAP15, the sixth annual Research Data Access and Preservation Summit, is accepting proposals (max. 300 words) for panels, interactive posters, lightning talks, and discussion tables. Themes for RDAP15 were selected by this year’s planning committee with input from previous years’ attendees and RDAP community members.

These are the proposal deadlines for the 2015 RDAP Summit:

December 19, 2014: Panel Presentations Submissions Due
January 16, 2015: Interactive Posters and Lightning Talks Submissions Due

For further details see RDAP15′s Call for Proposals webpage.

Metadata Services for Research Data Management Call for Presentations

e-Science Portal Blog - Tue, 12/02/2014 - 12:11

The ALCTS interest group of ALA has issued a Call for Presentations for the program “Metadata Services for Research Data Management” that will be held during the ALCTS Virtual Preconference “Planning for the Evolving Role of Metadata Librarians”, that will be held prior to the ALA annual meeting in June 2015 in San Francisco. Deadline for proposals is this Friday, Dec. 5th.  See full announcement on Metadata Interest Group blog .


Evolving Scholarly Record and the Evolving Stewardship Ecosystem – Workshop Series

e-Science Portal Blog - Mon, 12/01/2014 - 09:58

OCLC is sponsoring a series of workshops that build upon the framework presented in its recent research report The Evolving Scholarly Record. Workshops will be held in Washington, DC, Chicago, San Francisco, and Amsterdam. Seating is limited so you are encouraged to register now. See announcement for further details.


New England Science Boot Camp is heading Downeast!

e-Science Portal Blog - Thu, 11/20/2014 - 10:13

The upcoming 2015 New England Science Boot Camp will be held June 17-19 on the beautiful campus of Bowdoin College in Brunswick, Maine.  Plans for session topics and activities are currently underway and will be announced in the next few months.


Broader Impacts and Data Management Plans

e-Science Portal Blog - Thu, 11/13/2014 - 13:57

By Andrew Creamer, Scientific Data Management Specialist, Brown University

The National Science Foundation (NSF) explains that Data Management Plans are to be “reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.” As the librarian responsible for writing data management and sharing plans, I was invited to be a part of my institution’s Broader Impacts Committee, which aims to “help Brown faculty and researchers respond effectively to the Broader Impacts criterion and other outreach requirements of governmental funding agencies.” For example, it helps to build collaborations between the K-12 educators in my state and the university’s researchers, and it promotes a database to share STEM curricula, among others.

The NSF views Broader Impacts through the lens of societal outcomes:

NSF values the advancement of scientific knowledge and activities that contribute to the achievement of societally relevant outcomes. Such outcomes include, but are not limited to: full participation of women, persons with disabilities, and underrepresented minorities in science, technology, engineering, and mathematics (STEM); improved STEM education and educator development at any level; increased public scientific literacy and public engagement with science and technology; improved well-being of individuals in society; development of a diverse, globally competitive STEM workforce; increased partnerships between academia, industry, and others; improved national security; increased economic competitiveness of the United States; and enhanced infrastructure for research and education.

Recently I was asked to speak at a Broader Impacts Workshop for faculty. In my presentation I focused on several ways that their proposal’s DMP can connect with the societal outcomes described in their Broader Impacts. For example, researchers detail in their NSF DMPs when and how they will make their data and research products available for other researchers and/or the public, how they will archive and preserve access to their research products after the project ends, and they outline the dissemination strategy for their projects’ research products, which can include citing and sharing the projects’ data, metadata, and code in their publications and presentations and depositing these items into a data-sharing repository. Retaining, preserving and making data, metadata, and code, along with the resulting publications, accessible maximizes the potential for replication and reproduction of research results, and therefore they further the impact of the project by making it possible for their data  and research products to be discovered, used, repurposed, and cited to aid in new research and discoveries.

Ways the Library Can Support Broader Impacts and Preserve and Disseminate Related Research Products

  • The library can advise on selecting optimal file formats and media in which data can be stored, shared, and accessed. Proprietary software and data formats used to collect and capture data can impact the potential for a dataset to be of use by others. Researchers can work with the library to identify and export their data files into data-sharing and preservation-friendly formats.
  • The library can collaborate with researchers to create the documentation and contextual details (metadata) that can make their data discoverable and meaningful to others. The library can help researchers locate metadata schema, standards and ontologies for a specific discipline, and it can also help to create metadata for data being prepared for upload into to a data-sharing repository.
  • Depositing their Broader Impacts curricula and data into a repository is a way for researchers to guarantee that their research products will be discovered and used by others. It is also the easiest way to locate and access data years after a project ends. Libraries can offer a number of repository related services. It can help researchers to choose and evaluate potential repositories. The library can offer an institutional repository (IR) as an option for some researchers to publish, archive, and preserve their project’s data after their projects end.
  • More libraries are offering a global persistent identifier service for researchers wishing to maximize the dissemination and discoverability of their datasets. A digital object identifier (DOI) is one way the library can provide researchers and the public a way to locate and cite data. The library for example through EZID can issue researchers DOIs, even if their datasets are not in their IR. For example, the library can issue researchers DOIs for the datasets they have deposited in NCBI databases that have accession numbers so they can then cite these datasets in their publications, presentations, and grant reports. The library also mints DOIs for researchers who are required by publishers to submit a DOI for their datasets underlying their manuscripts or for compliance with their publishers’ data availability and data archiving policies.

While researchers may have not thought about the library when it comes to societal outcomes and disseminating research data, we librarians hope that they will begin to see the library as the ideal institutional space to plan for data retention, appraising which research products should be retained, archived, and preserved, exploring options for sharing and long-term preservation-friendly file formats, creating documentation and metadata to make data discoverable and useful, publishing and archiving data in a repository, citing data, and disseminating and measuring the impact of data.


November 2014: recent job postings

e-Science Portal Blog - Wed, 11/12/2014 - 12:39

From around the web (mostly from the ALA job list): here’s a list of recent job openings that may be of interest to the e-Science Community:

California State University, East Bay Library:  Health Sciences and Scholarly Communications Librarian

California State University, San Marcos:  Health Sciences and Human Services Librarian

Cornell University Library:  Director of Preservation Services

Dartmouth College:  Research and Education Librarian, Biomedical Libraries

Head of Education, Research and Clinical Services

Institute for Health Metrics and Evaluation, University of Washington:  Data Indexer

Iowa State University:  Science & Technology Librarian (Engineering & Physical Sciences)

New York University:  Research Data Management Librarian

Pennsylvania State University:  Science Data Librarian

Tufts University:  Research & Instruction Librarian

University of California at Los Angeles (UCLA):  Geospatial Resources Librarian

University of New Hampshire:  Life Sciences and Agriculture Librarian

University of New Mexico Libraries:  Research Services Librarian for the Engineering, Life & Physical Sciences


Upcoming Digital Science workshops at Tufts and UMass Medical School

e-Science Portal Blog - Fri, 11/07/2014 - 17:45

The following announcement has been posted on behalf of the Boston Library Consortium and Digital Science. For information about the workshop or to register, please contact Susan Stearns at sstearns@blc.org

Addressing the Emerging Needs of the Research Ecosystem: An Invitation




The Boston Library Consortium and Digital Science invite you to attend a free workshop focused on the management, dissemination, and collaboration around research data in the university.  Today’s research ecosystem is increasingly complex and includes players from many different departments and groups within the academy: research and sponsored program staff, the CIO and IT staff, library deans/directors and their scholarly communications and research data management librarians, university marketing and communications staff and, of course, the researchers themselves.

Meeting the diverse requirements of these varied groups in efficient and cost-effective ways requires that quality data are able to flow in and out of university information systems, often populating such diverse technologies as grants management systems, researcher profiles, institutional repositories, and enterprise data warehouses.  Non-traditional measures of research impact such as Altmetrics and the increasingly prevalent funder mandates create new challenges for universities as they look to ensure a robust research information management environment.

Our goal for this workshop is to assemble a representative cross-section of stakeholders from a variety of BLC institutions. The workshop will bring together experts from Digital Science, a technology company with a focus on the sciences that provides software and tools to support the research ecosystem, and speakers with direct experience of evaluating and implementing research information management systems and services. We hope you will actively encourage your colleagues to attend.

Two options are available for the workshop as indicated below. BLC is considering offering live-streaming of one or both sessions if there is adequate interest.

Friday, November 21st at Tufts University, Medford Campus – 9:30am – 2:30pm; lunch included

Tuesday. November 25th at the University of Massachusetts Medical School, Worcester – 10:00am – 3:00pm; lunch included

Workshop speakers will include: Jonathan Breeze, CEO of Sympletics, Mark Hahnel, CEO of Figshare and the Vice Provost for Research or equivalent from a local Boston University Consortium member institution.

To register or for further information, send an e-mail to sstearns@blc.org indicating which of the above sessions you are interested in attending.

Learning About Git

e-Science Portal Blog - Fri, 11/07/2014 - 13:37

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

The role of the data librarian extends far beyond helping researchers write data management plans. Rather, librarians working where data-intensive science is happening spend their time answering questions about the entire data life cycle—data pre-processing, analysis, visualization and data validation are all important, and sometimes highly intricate, parts of the research process. As a data services librarian I have personally found myself advising researchers to rework their workflows to make use of tools they have available to them help make their research more replicable, efficient, and shareable at these various stages of their research process. Unfortunately though, I do not always have hands-on experience with the tools and techniques I’m advising researchers to use– nor is it possible for me to always have experience using every tool out there available to researchers in computational environments. However, I do believe it’s important for me to get as much hand-on experience as possible with the most useful, commonly used tools, so that I can develop both refined expertise in my field, and also empathy for my patrons. E-Science Portal editor Donna Kafel recently wrote a wonderful post where she reflected upon, and pulled advise from others about self-learning and the challenges associated with it. Here, I aim to outline how I’m making use of some of the excellent advice offered in that post, while focusing in on an area of the data life cycle that I believe is sometimes oversimplified in discussion—I’m referring to the version control processes inherent in good data management.

Be single-minded. Identify one topic or skills you want to learn and focus on mastering it.” – Donna Kafel, Challenges of Self Learning

I decided the advice I would take to heart most fiercely from Donna’s self-learning post was the above take-away. It rang true with me because I regularly encounter problems by trying to tackle too many new topics at once. If I don’t use something regularly, it’s difficult for me to become proficient—especially with technically challenging tools. It makes sense that I should focus more on mastering a single skill before moving on to anything new, but how to choose what to focus on? This is where Version Control Systems (VCS) or “Revision Control Systems” come in. VCSs are incredibly diverse in both complexity and application, and while I rarely see them discussed at length by librarians, I find them to be exceedingly important to researchers in collaborative environments. I regularly read discussions on file naming as an approach to control versioning and to aid researchers in a multitude of data management processes, and I do not want to discredit that discussion because it is so important (check out some of the great writing on this topic right here on the portal blog!), but I’m hoping to extend that conversation a bit more in this post. Below I focus in on Git as both a self-learning opportunity and incredibly useful VCS.


Git is a technology that “records changes to a file or set of files over time so that you can recall specific versions later”1. You can use Git for just about any type of file, but it is primarily used by people working with code files. Often times, people use simpler version-control methods, like copying files into a time-stamped directory, but this tactic is risky—one could forget which directory files are stored in or accidentally write over the wrong file (file naming helps here), but an even better approach is using a tool like Git. 1

Git is what is called a Distributed Version Control System (DVCS), but it is easier to understand DVCS if you first understand Centralized Version Control Systems (CVCS). CVCSs have a single server that contains all the versioned files a group of people are working on. Individuals can “check out” files from that central place so everyone knows to some extent what other people on the project are doing. Admins have control over who can do what so there is some centralized authority making it easier to manage than local version control solutions. Examples of CVCSs include the popular Apache tool Subversion1

CVCS- Chacon, 2014

 There are though some drawbacks to using a CVCS—namely, the single server situation. If the server goes down, not only can no one can make any changes to anything that’s being worked on, but if the server gets damaged and is corrupted, the individuals working on the project are completely reliant on there being sufficient backups of all versions of their files. This is again, quite risky.

 To mitigate this problem, DVCSs were developed. In distributed systems (like Git) people do not just check out the latest version of a file, they completely “mirror” the repository. In this way if the server dies, anyone who mirrored the repository can copy back to the server and restore it. Every time someone checks out a file, the data is fully backed up

DVCS- Chacon, 2014

Distributed systems are also capable of working well with several remote repositories at once, allowing people to collaborate with multiple groups in different ways concurrently on the same project. 1

However, I did not decide to focus my single-minded self-learning on Git just because it is so useful for version control—I wanted to focus on learning as many skills as possible, while still staying focused. You see, in learning to use Git, I’d have more opportunity to learn about Bash Unix Shell. Having some background in using command line interfaces, I am still a beginner with the Terminal and figured that learning Git would get me much more proficient with navigating my computer via the command line, which in-turn could help me get up the confidence to learn how to use a Linux operating system. Learning Git would also help me learn how to use GitHub, which is growing by the day in popularity as a place for people to store and share code. The GitHub graphic user interface would also help get me off the ground. So I found Git to be the great door-opener to many other skillsets on my list of self-learning goals.

Thus, I have begun learning to use Git and GitHub. I was able to get some hands-on experience with it by participating in a Software Carpentry Bootcamp this past summer, but didn’t find the time to dedicate to following up on it– I was not staying focused on learning a single new skill. So now I am re-grouping. I have primarily been using the resources I am providing below, however there is so much more out there. These resources are just a great place to start, and having made some headway in my own reading of these documents I hope to be trying out Git more in the very near future.

Pro Git Great free eBook and videos on getting started with and better understanding Git and version control. I used this excellent book in writing this post.

Pro Git Documentation External Links Tutorials, books and videos, to help get you started.

Even if you don’t think learning to use Git is right for you, learning more about the tools researchers are using to work with their data and getting a look under the hood about how those technologies work can be a great way to continue to grow professionally. I hope you all have the opportunity to join me in exploring a new skill and share your experiences with the e-Science Portal Community.


1. Chacon, S. (2014). Pro Git. Berkeley, CA: Apress. http://git-scm.com/book/en/v2

And just incase you weren’t already overwhelmed, here’s a great TED Blog on places to learn how to code!

Dr. Bruce Alberts: Science and the World’s Future

e-Science Portal Blog - Fri, 10/24/2014 - 17:57

Science and the World’s Future
Lecture given by Bruce Alberts, Professor of Science and Education, UCSF
Part of the Sanger Series at Virginia Commonwealth University, Richmond, VA

Bruce  Alberts’ lecture was a review of his career that focused on the lessons he learned along the way and how they are important for the future of science research and the earth.

He failed his initial PhD exam at Harvard, but earned it 6 months later after more research. This taught him that having a good strategy in science research was a key to success, and negative results were okay.

Alberts started his own lab at the age of 28, and he believes that it should be easier for researchers to set up their own labs earlier in their careers – so funding needs to change.

After many years of research, Alberts became president of the National Academy of Sciences (NAS) and started learning about science policy.  Science allows humans to gain a deep understanding of the natural world, and we can  use this knowledge to predict future events or problems. Many government people wanted the NAS reports to be kept secret or have changes made but he felt that science was for all and that NAS was providing independent policy advice based on science, so there could be no changes or secrecy.  Now the full text or a report goes on website when the government gets it.

Alberts’ work with NAS and as editor for Science magazine led him to international work with science academies. Alberts said that science and technology developed in North America or Europe can’t always be exported to the countries that need it.  Countries need national, merit-based science institutions to help with policy and support science.  Only local scientists have the credibility to rescue a nation from misguided local policies. Alberts’ examples were  AIDS in Africa or polio vaccine in Nigeria.  Alberts feels that the world needs more of the creativity, rationality, openness, and tolerance that are inherent to science for success of every nation.  What Pandit Jawaharlal Nehru of India called “scientific temper”.

Alberts suggested strategies to help the world’s future:

  1. Education – active learning, open access, start by changing college science teaching since that is where high school science teachers learn science. (Science special issue April 19, 2013: Grand Challenges in Science Education and Education Portal http://portal.scienceintheclassroom.org/ )
  2. Promote science knowledge as a public good – open access again, not just papers but other educational materials, eg. http://www.ibiology.org/
  3. Empowering best young scientists- Global Young Academy
  4. Developing scientists as connectors – science communication, scientists need to connect with policy makers and the public, such as the AAAS Science & Technology Policy Fellowship program 
  5. Develop and harness research evidence to improve policies.

What can librarians do

Obviously information literacy is huge when it comes to making sure students and future voting adults, can find the information they need to make decisions about health, technology, and science. Teaching regularly about reliability of web sites and other information sources must be part of this training.

I think librarians can also help harness the research evidence needed to improve policies.  We have excellent search skills and many of us already have experience doing systematic reviews, which is what is needed to find all the evidence.

If you want to read more about Bruce Alberts, this interview by Jane Gitschier is good: Scientist Citizen: An Interview with Bruce Alberts

I liked this quote used by Alberts:

“The society of scientists is simple because it has a directing purpose: to explore the truth. Nevertheless, it has to solve the problem of every society, which is to find a compromise between the individual and the group. It must encourage the single scientist to be independent, and the body of scientists to be tolerant. From these basic conditions, which form the prime values, there follows step by step a range of values: dissent, freedom of thought and speech, justice, honor, human dignity and self respect.

Science has humanized our values. Men have asked for freedom, justice and respect precisely as the scientific spirit has spread among them.”

—  Jacob Bronowski, Science and Human Values, 1956

Syndicate content