Feed aggregator

IDCC 15 – Part 2 (It’s a big conference)

e-Science Portal Blog - Mon, 03/02/2015 - 11:38

Last week in her blog post, Margaret discussed the twitter feed from the International Data Curation Conference (IDCC) that took place on Feb 9th to the 12th. I was fortunate enough to be able to attend and participate this year, and as it is a premier event for data professionals, I’d like to add a bit more about the conference.

The theme this year was “Ten years back, ten years forward: achievements, lessons and the future for digital curation”. Tony Hey, formerly of Microsoft Research and now a Fellow at the University of Washington, was the opening Keynote.  He did a very nice job of illustrating how far we have come in the past ten years. Data management and curation are now recognized as important issues and discussed in high-profile venues like Science and Nature.  However, he also noted that we still have some very serious problems to address. Funding for curation is often based locally, but use of digital data is global. More and more data repositories and tools are coming online, but support for these initiatives are still quite fragile and we have lost some important resources (RIP Arts & Humanities Data Service).

This tension between how far we have come vs. how far we have yet to go was echoed in a panel session titled “Why is it taking so long?” moderated by Carly Strasser from DataCite. Some of the panelists pointed to a lack of incentives, infrastructure and support as barriers to progress. However, others noted that actually quite a lot of progress had been made when one considers the scope of the changes in culture and practice that we are championing.

Presentations on Data Education struck a similar tone. Liz Lyon, from the School of Information Studies at the University of Pittsburgh, noted that roles for Data Professionals are becoming more prominent and defined, but the educational path to prepare oneself to perform these roles is still unclear. iSchools at Pitt and the University of North Carolina, whose program was described by Helen Tibbo, are seeking to position themselves as the places to fill this need.

Though awareness of curation has increased, we still have a ways to go in training academics in curation.  Research done by Daisy Abbott from the University of Glasgow demonstrated a gap between the perception among graduate students that curating their work is important with their reporting that they lack the expertise to curate their work effectively. Fortunately, we have Aleksandra Pawlik and others from the Software Sustainability Institute offering Data Carpentry workshops to help raise data literacy levels of researchers.

The program with presentation slides is available on the IDCC15 website, and the papers will soon be published in the International Journal of Digital Curation. The location of IDCC16 has yet to be announced, but I highly recommend attending if you get the chance.

IDCC15 – I Couldn’t Go But I Followed on Twitter

e-Science Portal Blog - Fri, 02/20/2015 - 10:58

I enjoy going to conferences. I love learning new things and getting new ideas.  I really love the way I’m inspired by the people I meet. But, I can’t go to every conference. Like most people, my university library budget is limited and my own budget is limited. However, as more people in libraries and data take to Twitter and other social media, I can go to conferences vicariously.

 From February 9-12 I was at the 10th International Data Curation Conference  in London, England.  While I wish I had been there, it is possible I would have been so tempted by the sights of London that I might have skipped the meeting.

There is a Storify of the conference available if you want to have a look at all the events and photos and comments. Watching the #idcc15 feed each day made me envious but also excited, as I read about the successes and new ideas that were being discussed during the various programs. Great morning coffee and lunchtime reading. A few highlights you might want to check out:

While there are differences between US and UK regulations, we can learn from programs that work at any institution. Presentations by Imperial College London, Oxford Brookes University, and University of Edinburgh are summarized here, with links to some good resources.

It is also helpful to learn from the researcher’s viewpoint. Purdue’s Data Curation Profiles were the focus of one talk that dealt with the Technology Acceptance Model. And the second talk examined if research supervisors were prepared to provide advice and guidance. Slides and papers for both talks are linked from this summary.

 The Edinburgh group, mentioned above, has a great blog and a couple of posts there talk about IDCC15 covering the first day and another post looking at how the 80/20 rule applies to RDM tools  (if you haven’t heard about the 80/20 rule, also know as the Pareto principle, check out the Wikipedia article)

 A useful Storify covers RDM training for librarians . There are slides embedded in the page, so have a look at the various curricula that were presented.

While we focus on eScience here at the portal, there are also data things going on in other subjects.  If you’ve always wanted to learn a bit about digital humanities, try this video, ”The stuff we forget: Digital Humanities, digital data, and the academic cycle” by Melissa Terras, Director of University College London Centre for Digital Humanities

 This final blog post  recommendation gives you an idea of some of the other subjects covered in the meeting with  links to the talks

By the way, I use TweetDeck to keep track of multiple things on Twitter.  There are some basic instructions here  I have the regular stream of people and organizations I follow in the first column, and after that I have columns for hashtags I’m interested in, such as the #idcc15 label for meeting tweets or #medlibs for medical librarians.  When conferences are over, and I have favorited the tweets I want to follow up on later, I can delete the column. Favorites is another column in my TweetDeck.

7 Recommended Resources for E-Science Newbies

e-Science Portal Blog - Tue, 02/17/2015 - 19:52

Submitted by Donna Kafel, Project Coordinator for the e-Science Portal and the New England e-Science Program for Librarians.

During a recent meeting of the e-Science Portal’s Editorial Board, portal editors suggested that we create a downloadable document, perhaps titled “An Introduction to e-Science” that would provide an annotated list of the best overviews and introductory resources for librarians and library students new to the concept of e-Science and library based data services.  The e-Science Portal team  thought this was a great idea and we have it on our action item list for after the portal redesign is completed this spring.

In the meantime, there are a lot of e-Science newbies out there right now who are at a loss as to where to begin, and who may like some of this information a little sooner.  Looking at all the content packed into library guides on data management, hundreds of journal articles, and data webinars can be a bit overwhelming for those just starting out. Here are seven resources that can help newbies start out on the road to figuring out what is meant by the term e-Science and  how it impacts scholarly communication, library roles in e-Science, the structure of the scientific research environment, data types and data management.

1.  The Fourth Paradigm:  Don’t be intimidated, I’m not recommending that people read the entire book in one sitting! (But it’s worth going back to read individual chapters).  The Fourth Paradigm’s Foreword and the first chapter “Jim Gray on eScience:  A Transformed Scientific Method”  nicely illustrate how the integration of computers and evolving technologies have revolutionized the way science is conducted.

2.  The e-Science Thesaurus is a great place for Newbies to learn terms and concepts, and related  references.  Included in some of the entries, are interviews with librarians who are actively engaged in e-Science (for some interesting interviews, check out Data Curation Profiles Toolkit, Implementing a Data Sharing/Management Policy  and Informationist)

3.  What is e-Science and How Should it be Managed? :  captures the essence of e-science, critical roles for librarians, and the importance of open data sharing.

4.  A nice overview of e-Science and roles for librarians:

a)      Cyberinfrastructure, Data, and Libraries, Part 1. A Cyberinfrastructure Primer for Librarians (2007) – Part one of a primer for librarians on the major issues and terminologies of e-Science.

b)      Cyberinfrastructure, Data, and Libraries, Part 2 – Part two: the role of libraries in data management and how librarians can participate in the downstream and upstream phases of the research cycle.

5.  Data Types (4 min YouTube video)—describes the diverse entities that come under the umbrella term data and the different ways data is captured.

6.  A Day in the Life of an Academic Researcher Part 1 (7 minute YouTube video) and A Day in the Life of an Academic Researcher Part 2 (5 minute YouTube video) explains the research environment and the different roles played by members of a research team.

7.  The Journal of eScience Librarianship (JeSLIB) :  specifically dedicated to the advancement of e-Science librarianship, JeSLIB  includes peer-reviewed research  and “e-Science in Action” articles on topics such as research data management, librarians embedded on research teams, data services, data curation, and data sharing and re-use.

Glitter on the Highway: Data on the Website

e-Science Portal Blog - Wed, 02/11/2015 - 12:09

By Andrew Creamer, Scientific Data Management Specialist, Brown University

Glitter on the mattress
Glitter on the highway
Glitter on the front porch
Glitter on the hallway 
Love Shack, The B-52s, Pierson, Schneider, Strickland, EMI (1989).

Recently I was reading through the drafts of the Data Management Plans (DMPs) and Broader Impacts sections that were submitted with faculty NSF proposals through our data management plan service for 2014-2015. As I reviewed these data management plans, one of the commonalities I noticed was the ubiquity of statements that data would be linked from the project website or personal website. Listed as either a tool for dissemination or post-project archiving and access, or in some cases both, there was data on the website. In a few cases data on the website was conspicuously the only option listed for dissemination or post-project archiving. Most often it was mentioned nested in among other options; for example, for dissemination, the investigators would say they would disseminate the data by sharing it on their personal or project websites, depositing it in some type of data sharing repository, and publishing the results in academic journals and presenting these at scientific meetings. As I looked over these drafts I could see where in each occurrence I had marked a comment asking the investigators for more information about what they meant by putting data on a personal or project website and to please have a conversation with me regarding this option.

The opaque “data on the website issue” comes up in almost every conversation I have had with faculty using our DMP service: “So, you say here that you have a website. How exactly are you storing and making your data available on your website? Who is responsible for doing and maintaining this, etc.” This conversation can go many ways, of course. While some faculty mean that they are depositing the data into a repository and have a persistent link that they will place on their personal or project website in a citation that will link out to the data, some faculty mean that they have a personal server, and in some alarming cases, a web server, where they will place and link to data on their website. While the former intention also leads one down a line of important questioning about suitability and sustainability, such as which repository, what kind of persistent link, etc., it is the latter scenario, of course, that concerns us research data management librarians the most.

In their article published in PLOS ONE last summer, How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers, Pepe et al. (2014) provided evidence that we can use in conversations with investigators about considering alternatives for storing data on their web or personal servers. Their findings showed putting data on a personal or project-based website was the third and fourth most popular practice for data sharing practices among astronomers after emailing or placing data on a FTP-style site. Then they looked through the external links to data published in a defined period of astronomy literature and found:

“This exploratory analysis reveals three key findings. First, since the inception of the web in the early 1990′s, astronomers have increasingly used links in articles to cite datasets and other resources which do not fit in the traditional referencing schemes for bibliographic materials. Second, as for nearly every resource on the web, availability of linked material decays with time: old links to astronomical materials are more likely to be broken than more recent ones. Third, links to “personal datasets”, i.e., links to potential data hosted on astronomers’ personal websites, become unreachable much faster than links to curated “institutional datasets”. (Pepe et. al 2014)

The practice of placing data on a website may be entrenched in the data sharing practices of certain scientific communities, but as research data management librarians we need to be sure that we do not become numb to its ubiquity; instead we must continue to question the researchers about what they mean and list ways that we can still help to make data accessible from their website but mitigate the myriad issues related to storing data on web servers or personal servers, e.g., lack of back up, persistent identifiers, no long-term preservation strategy, lack of sufficient metadata, link rot, diminished discoverability and the access risks when only one person is the sole individual responsible for making data accessible.

On the publisher side, last spring PLOS added this text to their Data Availability Policy: “Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.” This one sentence has also been helping me in the endeavor to dissuade researchers from stating that if another researcher wants or needs access to their data, then he or she can just contact them as the sole means of data access or access their data on their personal website as sole means of data dissemination. So let us hope that research funders will also begin pushing back on researchers that want to use their personal or project websites and their personal and web servers as the sole means of data dissemination or storage location for post-project access.

Citation: Pepe A, Goodman A, Muench A, Crosas M, Erdmann C (2014) How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers. PLoS ONE 9(8): e104798. doi:10.1371/journal.pone.0104798

A gentle introduction to Docker for reproducible research

e-Science Portal Blog - Thu, 01/29/2015 - 16:27

Submitted by guest contributor Stacy Konkiel, Director of Marketing & Research, Impactstory, stacy.konkiel@gmail.com.

By now, many data management librarians are familiar with the concept of reproducible research. We know why it’s important and how to (theoretically) make it happen (thorough documentation, putting data and code online, writing an excellent Methods section in a journal article, etc).

But if a scientist asked you for a single recommended reading on how to make their computational research reproducible, what would you send them?

I’d suggest “Using docker for reproducible computational publications” by Melissa Gymrek (a Bioinformatics PhD student at Harvard/MIT).

In her post, Gymrek introduces Docker, a “lightweight virtual machine” that allows a researcher to create a complete computing environment, hosted in the cloud, that other researchers can log into to reproduce results using the original researcher’s code and data.

No need to download and install R packages, or to figure out how to make someone else’s code play well with their operating system. Just install Docker, enter a simple line at the command line, and–boom–they’ve got a virtual machine running on their computer that they can log into to reproduce someone else’s findings.

Docker is already popular in the software development world, and is gaining popularity with bioinformaticists and other computational researchers. Learn more about Docker and how it can work for reproducible research on Melissa Gymrek’s blog.

Winter is the perfect time for a virtual conference or webinar!

e-Science Portal Blog - Wed, 01/28/2015 - 15:19

There’s been a flurry of upcoming virtual conferences and webinars springing up and providing educational opportunities while obviating the need for travel in  wintry weather. In a previous post, I had noted the upcoming DataONE webinar series that begins on Feb. 9th with the webinar “Open Data and Science:  Towards Optimizing the Research Process.”

NISO is sponsoring a six hour long (11 am – 5 pm EST) virtual conference on Feb. 18th:  “Scientific Data Management :  Caring for your Institution and its Intellectual Wealth. Hosted by Todd Carpenter, Executive Director of NISO, the program includes speakers from the Dept. of Energy, Emory, Tufts, Oregon State University, UIUC, the Center for Open Science, and the RMap project. The final session will be a roundtable discussion. Program topics for the conference include:

  • Data management practice meets policy
  • Uses for the data management plan
  • Building data management capacity and functionality
  • Citing and curating datasets
  • Connecting datasets with other products of scholarship
  • Changing researchers’ practices
  • Teaching data management techniques

Finally (although I suspect I’ll soon be adding to this snowballing list), Elsevier is sponsoring the webinar “Institutional & Research Repositories:  Characteristics, Relationships and Roles” on Feb. 26th from 11 am-12:15 pm (EST)


DataONE is launching a new webinar series

e-Science Portal Blog - Thu, 01/22/2015 - 09:51

DataONE is launching a new Webinar Series (www.dataone.org/webinars) focused on open science, the role of the data lifecycle, and achieving innovative science through shared data and ground-breaking tools.

The first of the series is a presentation and discussion led by Dr Jean-Claude Guédon from the Université de Montréal titled:

Open Data and Science: Towards Optimizing the Research Process”.

Tuesday February 10th 9 am Pacific / 10 am Mountain / 11am Central / 12 noon Eastern

The abstract for the talk registration details can be found at: www.dataone.org/upcoming-webinar.

Webinars will be held the 2nd Tuesday of each month at 12 noon Eastern Time.  They will be recorded and made available for viewing later the same day. A Q&A forum will also be available to attendees and later viewers alike.

More information on the DataONE Webinar Series can be found at: www.dataone.org/webinars .


R: Addressing the Intimidation Factor

e-Science Portal Blog - Wed, 01/21/2015 - 16:22

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

Working with students and researchers to help them better manage and work with their research data is a big part of the librarian’s role in a data-intensive setting. Much of the time though, the librarian needs to critically think through and advise on tools used in different parts of the data life cycle as well– this includes the data pre-processing and analysis phases of a research project. Increasingly, I am finding myself dealing with this sort of situation in my library– for example: a student comes to the library with a question about getting access to some tools for working with her data; this might mean that the student needs help restructuring some spreadsheets or other data manipulation task, but more often than not the student is also seeking statistical software and tools for data visualization. In my experience, this type of situation has been more common than requests for help on data management plans or research documentation. This type of reference interaction is also where many librarians and information professionals begin to have to have discussions about and encounters with R programming.

R is a free statistical programming language with a notorious learning curve, but students and researchers are increasingly seeing the value in tackling that curve. I was fortunate enough to take some advanced statistics courses throughout my educational career and learned R in a trial-by-fire set-up. I also co-instructed an introductory course on Computational Health Informatics this past summer wherein we taught introductory R functionalities. Therefore, when patrons come to the library looking for help getting started with R, I feel confident helping them. However, I know that when R comes up in discussions with my colleagues, they do not always feel confident assessing whether or not it is worthwhile to advise a student to learn R or just run some stats tests in Excel. My colleagues also are often intimidated when it comes to R because they are not confident that they understand how to trouble shoot and find resources for students just getting started with the program.  As a result of witnessing this type of situation on many occasions, I present here my attempt at lowering the intimidation factor surrounding R for librarians. You do not need become an R programmer to know how to approach it critically and with the ability to help others get started.

I just want to start by saying though, that I almost always encourage students to pursue learning R rather than pushing them toward Excel or a statistics program that they would need to purchased as our library does not offer regular access to stats software on our computers.  R is also much more robust for working with data than Excel. However, I realize that some students and researchers just want to get their work done and want nothing to do with learning a new programming language. At that point, I generally very briefly point R out to them anyway in case the student ever does decide that it might be useful to learn. If the student is unsure if R is what she is looking for, however, I ask the following questions:

  • Have you ever worked with data using code? (e.g. Stata, SAS)
  • Would you be willing to spend some time learning how to use a new tool?
  • Are the statistical tests you need to run somewhat complex?
  • Will you need to repeat the steps for how you cleaned up your data?
  • Will you need to repeat the steps for how you analyzed your data?
  • Will visualizing your data be very important to you on this project?
  • Is your data in more than one format?

If the student answer “yes” to a few of these questions, I would strongly encourage them to use R rather than a tool like Excel.  Check out Chris Leonard’s discussion on the R Blog for more information on the Excel vs. R question. And with the below resources and jargon under your belt, you will feel more comfortable approaching R programming if you and your patron do decide that R is a good choice.

One of the first resources I usually point new users to, is Quick R. Quick R provides new learners and experienced users alike with “a roadmap and the code necessary to get started quickly, and orient yourself for future learning” with R. I encourage librarians and patrons to look through the “Data Types” section of Quick R if you are unfamiliar with the concept of data types as understanding how R users talk about data will get you feeling less intimidated by unfamiliar terms right off the bat.

There is some other basic jargon you should be aware of when talking about R with patrons as well. For instance, if you are using R, you will likely need to use R packages. “Packages are collections of R functions, data, and compiled code in a well-defined format” (Quick R). The place where packages are stored is called the library. R comes with a standard set of packages when you install it, but others are available for download and installation. You can install packages by running the following command with the name of the package you need:

> install.packages(“name_of_package”)

Once installed, packages need to be loaded into the session to be used. This can be done using the command:

> library(name_of_package)

There are also buttons on interfaces like R Studio that can help you install and load packages without needing to write commands.

R function is another term you’ll likely hear if you get questions about R. R functions allow you to write commands and store them in easy to read and implement text. For example, this is how one could write a function that subtracts one number from another number. The function in the example is called f1:

> f1 <- function(x,y) {x-y}

There are entire packages of functions written by others to help users accomplish complicated tasks. For example, if a researcher decides she needs run some regression diagnostics, there are pre-written functions to accomplish this task in the package called “car“. When the researcher installs the car package and loads it from her library, she will be able to access to functions to run her diagnostics.  You can view an example of this and many other statistical analysis functions using Quick R.

I also tend to point researchers and my colleagues toward general reference material if they are looking for more granular help getting started with R programming. The following have been very useful in the past:

Additionally, the following tutorial resources are usually very well received:

And one should never neglect the help available through the R Community:

The resources noted above, and many others are listed in a resource guide that I developed on R and Data Mining, which can be found here.

In summary, you do not need to read all of these resources on R to help others work with it. By going through some of the above material and familiarizing yourself with the terminology and resources associated with R, you will be well equipped to help with common R problems. R is challenging, but like all new things, exposure is the only way to get used to it. Start small with terminology and basic documentation– in this way you will gain the confidence and knowledge necessary to begin working on reference transactions that involve R programming.


A Model of Collaborative Education Efforts in Data Management: the Virginia Data Management Boot Camp

e-Science Portal Blog - Thu, 01/15/2015 - 12:42

Submitted by guest contributor Yasmeen Shorish, Physical & Life Sciences Librarian at James Madison University.


Question: How do you deliver the same data management training to graduate students, faculty, and staff simultaneously? How do you deliver that content not just at your own institution, but also to six other institutions across the state?

Answer: Very carefully, with a lot of cooperation, collaboration, and some technical wizardry thrown in as well. This is the story of seven Virginia institutions who stopped repeating content individually and started getting real – real collaborative.

In January 2013, the libraries at the University of Virginia (UVA) and Virginia Tech (VT) teamed up to produce a “Data Management Bootcamp” for graduate students on their campuses. Utilizing telepresence technology, speakers could interact with participants at either school in large, virtual sessions as opposed to discreet events at each venue. Librarian interest in this event resulted in the addition of three additional institutions in 2014: James Madison University (JMU), George Mason University (GMU), and Old Dominion University (ODU). UVA, VT, JMU, and GMU have an existing telepresence set-up called 4-VA and it was not difficult, technology-wise, to add ODU in to participate fully as well. Librarians from these five institutions, including myself, formed a planning group to produce the “2014 Virginia Data Management Bootcamp.”

However, expanding a program from two locations to five locations does present some complications. Can everyone connect simultaneously? Do the screens get too cluttered when everyone is connected? How do we decide what content is most appropriate for five very different institutions? The 2014 Bootcamp began planning in the summer of 2013. A series of virtual meetings among the planning group resulted in an agenda that included understanding research data, operational data management, data documentation and metadata, file formats and transformations, storage and security, DMPTool and funding agencies, rights and licensing, protection and privacy, and preservation and sharing. It was a lot to cover in two full days, with a third half-day for local discussion. The full agenda can be found on this LibGuide.

The group debriefed after the 2014 event and discussed what 2015 should look like. We knew that the next event should be less dense, as that much content in two days was somewhat overwhelming.  The College of William & Mary (WM) and Virginia Commonwealth University (VCU) both expressed a desire to participate. With some technological work involving bridges, WebEx, and patience, the Virginia Data Management Bootcamp was able to expand to include these universities. Happily, increasing the number of participating institutions did not increase the complexity very much. One change that may have had the most impact was that the planning group decided to add more in-person meetings to work through curriculum ideas. We found that as a group, we could accomplish more in a shorter amount of time when we were gathered around one table, discussing ideas.

Using pre and post assessment surveys helped us zero in on some areas for change. We wanted to build more interactivity and limit the amount of lecture for each area.  We also wanted to engage the audience in the research cycle more intentionally than we had been. We redesigned the three-day event in smaller chunks, with more local discussion and more hands-on activities. A full schedule can be found on this LibGuide.

Can other states or groups of libraries produce a cross-institutional data management outreach program? Yes!

What if they lack a fancy telepresence room? Still, yes! There are viable alternatives that may have a different look and feel, but can still accomplish the same goal.

Want to launch a cross-institutional program of your own?

The best way to get started is to first get a sense of who would want to participate. Propose the workshop and form a planning group. The number of participating venues will shape what technology you use to bring it all together. WebEx may be appropriate, or even a Google Hangout (although image quality could be concern).

How much time can you set aside for the workshop? One day? Three days? That will determine what gets covered and how. The more hands-on engagement that you can work into the program, the more likely you are to keep interest across sites.

Determine a meeting schedule for the planning group and decide which meeting method (virtual vs. in person) will be more effective. Individually, each site will need to coordinate with its own campus partners to make it as big an event as they wish. Assessment of some kind is necessary to determine what could change if you do it all over again.

Collaborative education efforts such as these can help institutions leverage the expertise that is naturally distributed. Setting a foundational learning outcome for data management is an achievable goal and a good way to build a community of practice in your local region.





Just published: Journal of eScience Librarianship special issue on data literacy

e-Science Portal Blog - Mon, 01/12/2015 - 14:34

The latest issue of the Journal of eScience Librarianship (JESLIB) has just been published! It is available at http://escholarship.umassmed.edu/jeslib/vol3/iss1/

 Table of Contents

Volume 3, Issue 1 (2014)


What is Data Literacy?
Elaine R. Martin

Full-Length Papers

Planning Data Management Education Initiatives: Process, Feedback, and Future Directions
Christopher Eaker

A Spider, an Octopus, or an Animal Just Coming into Existence? Designing a Curriculum for Librarians to Support Research Data Management
Andrew M. Cox, Eddy Verbaan, and Barbara Sen

An Analysis of Data Management Plans in University of Illinois National Science Foundation Grant Proposals
William H. Mischo, Mary C. Schlembach, and Megan N. O’Donnell

Initiating Data Management Instruction to Graduate Students at the University of Houston Using the New England Collaborative Data Management Curriculum
Christie Peters and Porcia Vaughn

EScience in Action

Research Data MANTRA: A Labour of Love
Robin Rice

Building Data Services From the Ground Up: Strategies and Resources
Heather L. Coates

Building the New England Collaborative Data Management Curriculum
Donna Kafel, Andrew T. Creamer, and Elaine R. Martin

Lessons Learned From a Research Data Management Pilot Course at an Academic Library
Jennifer Muilenburg, Mahria Lebow, and Joanne Rich

Gaining Traction in Research Data Management Support: A Case Study
Donna L. O’Malley

The New England Collaborative Data Management Curriculum Pilot at the University of Manitoba: A Canadian Experience
Mayu Ishida

Are you interested in submitting to JESLIB? Please refer to author guidelines at http://escholarship.umassmed.edu/jeslib/styleguide.html

Share your projects at the NE e-Science Symposium’s Poster Session

e-Science Portal Blog - Mon, 01/05/2015 - 17:40

One of the most popular attractions of the annual University of Massachusetts and New England Librarian e-Science Symposium is its poster session. The symposium poster session offers an ideal venue for librarians and library school students who are involved in e-Science and RDM projects and/or research to share their findings and exchange ideas with interested colleagues.  The poster session also includes a  contest, in which judges review the posters to determine the best in these three categories:  Most Informative in Communicating e-Science Librarianship, Best Example of e-Science in Action, and Best Poster Overall.

Interested? If you haven’t yet registered for the e-Science Symposium, make that your first step, as registration is filling quickly. Then, write your poster proposal and submit it following these instructions by the proposal deadline of Feb. 6.

Want to see some examples? The e-Science Symposium conference site features archived posters from past symposia. For links to the past six symposia, visit the e-Science Symposium conference page.

Got questions?  For further details, or questions regarding the poster contest, please contact Raquel Abad at raquel.abad@umassmed.edu



Two new articles featured in the Journal of eScience Librarianship

e-Science Portal Blog - Fri, 12/19/2014 - 12:29

The Journal of eScience Librarianship (JeSLIB) has just published the following two articles:

These two articles are part of Volume 3, Issue 1 of JeSLIB that will be published in January. An announcement will be made when the issue is published.



Registration now open for 7th annual New England e-Science Symposium

e-Science Portal Blog - Wed, 12/10/2014 - 17:01

Registration is now open for the 7th annual University of Massachusetts and New England Area Librarian e-Science Symposium, to be held on Thursday, April 9, 2014. For details and to register, visit the 2015 e-Science Symposium conference site.  Registration is on a first come, first serve basis and will be capped at 90 people.

Librarians: the original research data managers

e-Science Portal Blog - Wed, 12/10/2014 - 15:45

Submitted by guest contributor Nancy Glassman, Assistant Director for Informatics, D. Samuel Gottesman Library, Albert Einstein College of Medicine

In conjunction with Albert Einstein College of Medicine’s Faculty Development Program I lead an introduction to research data management workshop. Attendees usually include a mix of clinical and basic science faculty, as well as a few postdocs and graduate students. To set the stage at a recent workshop, I asked the group if they were surprised to have a librarian as the instructor. Taken aback by nodding heads around the table, I quickly recovered my composure and decided to make the most of this “teachable moment.”

All of the workshop’s attendees use the library’s resources and services, but as long as things are running smoothly and they find the information they need, they don’t really need to think about how it was made available to them. Many library users are unaware of what librarians actually do, and that’s just fine. But it’s worthwhile to take a few minutes to show researchers how the traditional library services they use almost every day require similar, if not the same, skill set as managing research data.

Librarians are, arguably, the original data managers. Think about it. Librarians have been managing data and information in one form or another for thousands of years, practically since the dawn of the written word. Archaeologists in Turkey have found collections of stone tablets dating back to the 17th-13th centuries BCE containing early forms of metadata.(1) These examples describe metadata concepts such as attribution and versioning:

“Written by the Hand of Lu, son of Nuggassar, in the presence of Anuwanza …”

“This tablet was damaged. In the presence of Mahhuziand Halwalu, I, Duda, restored it…” (1)

Fast forward to the library of the twenty-first century. We work and live in the era of big data in which “everything is available for free on the Internet.” Who makes sense of this information overload? Who selects, catalogs, curates, backs up, makes available relevant sources of information?  Who helps users cite these resources properly? Who safeguards patron information?

  •  Librarians are expert at making data meaningful and easily discoverable.  Look no further than the library’s catalog, a classic example of metadata in action. In medical libraries MeSH (Medical Subject Headings) is used to categorize material by subject.  Author names and titles are standardized.  Call numbers make it easy to find items on bookshelves.
  •  Although librarians are not copyright lawyers, we do have a lot of practice navigating copyright, licensing agreements, and open access as part of our regular activities. This includes negotiating with vendors, managing interlibrary loan, as well public- and open-access initiatives (including the NIH Public Access Policy).
  • Researchers rely on librarians for help in finding relevant, evidence-based information. In addition to being experienced searchers of online databases such as PubMed, Embase, and Web of Science, we also mine the “deep web” to find those elusive resources.
  •  Librarians are familiar with the rules and nuances of proper citation and attribution practices.  We support many citation management programs, including EndNote, RefWorks, and Mendeley, and teach students on how to cite correctly and avoid plagiarism.
  •  Data comes in a lot of different packages, and long term preservation and data storage are important aspects of managing research data.  Over the millennia we have maintained and preserved collections of tablets, scrolls, manuscripts, maps, audiovisual materials, print books and journals, e-books, e-journals, websites, blogs, wikis, and data sets.

Although the media and the volume of data have changed radically over time, the expertise to manage all of this remains essentially the same. Librarians are particularly adept at adapting to change. Helping researchers manage their data is a logical extension of a long-standing tradition.

After the workshop, one attendee approached me, and acknowledged that at first he was skeptical about taking a class on research data management led by a librarian, but after I described the ways traditional librarian skills apply, it all made sense.  Conversations like this can open users’ eyes to librarians’ wide range of information management skills and may lead to new and interesting partnerships.


1.  Casson L. Libraries in the ancient world. New Haven: Yale University Press; 2001. xii, 177 p. p 13.

ACRL Digital Curation Interest Group Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 13:02

The following announcement is posted on behalf of the ACRL Digital Curation Interest Group Team.

The ACRL Digital Curation Interest Group is looking for proposals for our Spring Webinars and for ALA Annual 2015.   The group would like to host three webinars in the Spring and have 3-4 panelists for ALA Annual 2015.  So please consider submitting a short abstract proposal!

CFP for our Spring webinars:

We invite proposals on topics germane to digital curation activities including (but not limited to) the following topics:

  • Documentation and organization
  • Digital preservation
  • Digital curation software and tools
  • Metadata specialists
  • Non-institutional repositories
  • Skills needed/Skills learned to tackle digital curation
  • Specific data management procedures such as file naming
  • Data purchased from vendors
  • Careers in digital curation
  • Digital curation lifecycle

We seek webinars of 60 minutes in length (including time for questions).  If you have an idea for webinar please send a short description of it to Megan Toups at mtoups@trinity.edu by January 31, 2015.

CFP for ALA Annual 2015:
We are putting together a panel of 3-4 people to present for ~10 minutes each covering digital curation from a variety of perspectives.  Panelists will present and then engage the audience in a productive conversation on digital curation.

We’d love to have a diverse set of panelists representing a variety of different digital curation perspectives–research data, archives and digital curation, theory, practice, etc.  Want to be a part of this interesting panel?  Please submit a short description of what you’d like to present to Megan Toups at mtoups@trinity.edu by January 31, 2015.

Thank you for your submissions!

The DCIG Team–Megan Toups, Suzanna Conrad, Rene Tanner

RDAP15 Call for Proposals

e-Science Portal Blog - Mon, 12/08/2014 - 12:12

RDAP15, the sixth annual Research Data Access and Preservation Summit, is accepting proposals (max. 300 words) for panels, interactive posters, lightning talks, and discussion tables. Themes for RDAP15 were selected by this year’s planning committee with input from previous years’ attendees and RDAP community members.

These are the proposal deadlines for the 2015 RDAP Summit:

December 19, 2014: Panel Presentations Submissions Due
January 16, 2015: Interactive Posters and Lightning Talks Submissions Due

For further details see RDAP15′s Call for Proposals webpage.

Metadata Services for Research Data Management Call for Presentations

e-Science Portal Blog - Tue, 12/02/2014 - 12:11

The ALCTS interest group of ALA has issued a Call for Presentations for the program “Metadata Services for Research Data Management” that will be held during the ALCTS Virtual Preconference “Planning for the Evolving Role of Metadata Librarians”, that will be held prior to the ALA annual meeting in June 2015 in San Francisco. Deadline for proposals is this Friday, Dec. 5th.  See full announcement on Metadata Interest Group blog .


Evolving Scholarly Record and the Evolving Stewardship Ecosystem – Workshop Series

e-Science Portal Blog - Mon, 12/01/2014 - 09:58

OCLC is sponsoring a series of workshops that build upon the framework presented in its recent research report The Evolving Scholarly Record. Workshops will be held in Washington, DC, Chicago, San Francisco, and Amsterdam. Seating is limited so you are encouraged to register now. See announcement for further details.


New England Science Boot Camp is heading Downeast!

e-Science Portal Blog - Thu, 11/20/2014 - 10:13

The upcoming 2015 New England Science Boot Camp will be held June 17-19 on the beautiful campus of Bowdoin College in Brunswick, Maine.  Plans for session topics and activities are currently underway and will be announced in the next few months.


Broader Impacts and Data Management Plans

e-Science Portal Blog - Thu, 11/13/2014 - 13:57

By Andrew Creamer, Scientific Data Management Specialist, Brown University

The National Science Foundation (NSF) explains that Data Management Plans are to be “reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.” As the librarian responsible for writing data management and sharing plans, I was invited to be a part of my institution’s Broader Impacts Committee, which aims to “help Brown faculty and researchers respond effectively to the Broader Impacts criterion and other outreach requirements of governmental funding agencies.” For example, it helps to build collaborations between the K-12 educators in my state and the university’s researchers, and it promotes a database to share STEM curricula, among others.

The NSF views Broader Impacts through the lens of societal outcomes:

NSF values the advancement of scientific knowledge and activities that contribute to the achievement of societally relevant outcomes. Such outcomes include, but are not limited to: full participation of women, persons with disabilities, and underrepresented minorities in science, technology, engineering, and mathematics (STEM); improved STEM education and educator development at any level; increased public scientific literacy and public engagement with science and technology; improved well-being of individuals in society; development of a diverse, globally competitive STEM workforce; increased partnerships between academia, industry, and others; improved national security; increased economic competitiveness of the United States; and enhanced infrastructure for research and education.

Recently I was asked to speak at a Broader Impacts Workshop for faculty. In my presentation I focused on several ways that their proposal’s DMP can connect with the societal outcomes described in their Broader Impacts. For example, researchers detail in their NSF DMPs when and how they will make their data and research products available for other researchers and/or the public, how they will archive and preserve access to their research products after the project ends, and they outline the dissemination strategy for their projects’ research products, which can include citing and sharing the projects’ data, metadata, and code in their publications and presentations and depositing these items into a data-sharing repository. Retaining, preserving and making data, metadata, and code, along with the resulting publications, accessible maximizes the potential for replication and reproduction of research results, and therefore they further the impact of the project by making it possible for their data  and research products to be discovered, used, repurposed, and cited to aid in new research and discoveries.

Ways the Library Can Support Broader Impacts and Preserve and Disseminate Related Research Products

  • The library can advise on selecting optimal file formats and media in which data can be stored, shared, and accessed. Proprietary software and data formats used to collect and capture data can impact the potential for a dataset to be of use by others. Researchers can work with the library to identify and export their data files into data-sharing and preservation-friendly formats.
  • The library can collaborate with researchers to create the documentation and contextual details (metadata) that can make their data discoverable and meaningful to others. The library can help researchers locate metadata schema, standards and ontologies for a specific discipline, and it can also help to create metadata for data being prepared for upload into to a data-sharing repository.
  • Depositing their Broader Impacts curricula and data into a repository is a way for researchers to guarantee that their research products will be discovered and used by others. It is also the easiest way to locate and access data years after a project ends. Libraries can offer a number of repository related services. It can help researchers to choose and evaluate potential repositories. The library can offer an institutional repository (IR) as an option for some researchers to publish, archive, and preserve their project’s data after their projects end.
  • More libraries are offering a global persistent identifier service for researchers wishing to maximize the dissemination and discoverability of their datasets. A digital object identifier (DOI) is one way the library can provide researchers and the public a way to locate and cite data. The library for example through EZID can issue researchers DOIs, even if their datasets are not in their IR. For example, the library can issue researchers DOIs for the datasets they have deposited in NCBI databases that have accession numbers so they can then cite these datasets in their publications, presentations, and grant reports. The library also mints DOIs for researchers who are required by publishers to submit a DOI for their datasets underlying their manuscripts or for compliance with their publishers’ data availability and data archiving policies.

While researchers may have not thought about the library when it comes to societal outcomes and disseminating research data, we librarians hope that they will begin to see the library as the ideal institutional space to plan for data retention, appraising which research products should be retained, archived, and preserved, exploring options for sharing and long-term preservation-friendly file formats, creating documentation and metadata to make data discoverable and useful, publishing and archiving data in a repository, citing data, and disseminating and measuring the impact of data.


Syndicate content