Feed aggregator

Lost in Translation?

e-Science Portal Blog - Wed, 05/20/2015 - 09:44

I think one of the most challenging aspects to being a data librarian is figuring out how to talk to people about what you do and the services that you provide. Metadata, curation, archiving all mean different things to different people, assuming that they are familiar with these terms at all. Even using the word “data” is a dangerous proposition. Researches in the Arts & Humanities may not see themselves as working with data. Librarians may also have questions about what constitutes “data”, given that definitions are often fairly broad.

As a result I find myself doing a lot of translating between different groups of people. In talking with faculty and librarians I will try to get a sense of how they think of research data and how they describe the issues that are relevant to them. In attempting to make connections with the people I interact with I tend to use full definitions first and then introduce particular terms later on. For example, I’ll talk about how critical it is for people to have access to contextual information about research data so that they will be able to understand the work that was done and trust the data, rather than bring up metadata.

Which brings me to the conundrum of the term “data information literacy”.

In writing up the 2011 article on DIL for portal, Michael Fosmire, C.C. Miller, Megan Sapp Nelson and I employed the term “data information literacy” to deliberately distinguish our work from “data literacy” for two reasons.

First, data literacy generally refers to how the data are used or manipulated to produce research outputs as opposed to how the data are managed, shared or curated. These things are certainly related to each other and a number of the DIL Competencies we came up with venture into this territory, but the perspectives and approaches are distinct from each other.  We thought that this distinction was an important one to make.

Second, we really wanted to make connections between data librarians and information literacy librarians and to affirm that each had important contributions to make in this area. We saw this as a “big tent” area for librarians where expertise and skill sets from multiple types of librarians would be needed to be successful. Data was (and perhaps still is) a foreign area to many librarians and so couching the work we were doing in something that was much more familiar and accepted made sense.

However, “data information literacy” as a term does not mean anything to people outside of academic libraries. This is not really surprising. “Information literacy” doesn’t really mean much outside of the library community either (with the possible exception of education) and that community has struggled a bit with how to present itself to faculty, students and others. From what little I know about this community there have been and continue to be discussions about changing “information literacy” to other labels such as “information fluency” to describe their work. These discussions highlight the difficulty of finding a term that succinctly encapsulates the work that librarians do in ways that are both meaningful to ourselves and to others.

I recently published an article with Marianne Stowell Bracke about a semester long course we taught to graduate students in the College of Agriculture at Purdue. We used the term “data literacy” to describe our work in the article because the venue we published in reaches beyond the library community, but perhaps more importantly this is how she and I connected with our students and our sponsors. We wrestled with the decision of what term to use in the article, but in the end choose to be authentic to how we discussed our work with our constituencies.

I don’t know that “data information literacy” as a term has really caught on yet with librarians, or if it ever will. And really that’s okay. I still see value in making the distinction between “data literacy” and “data information literacy”, but it’s more important that we connect with our communities in ways that they can understand and relate to. For now, I’m willing to trade shared terminology for forward progress.

For another take on “translating” as a component of Data Librarianship, check out this article by Kirsten Partlo published in IASSIST Quarterly

Removing Barriers to Re-use: Object Model Structure and Description

e-Science Portal Blog - Tue, 05/12/2015 - 12:48

By Andrew Creamer, Scientific Data Management Specialist, Brown University Library

Last week my colleague, Elli Mylonas, and I presented a paper at ACRL New England on “New Object Models and APIs: Foregrounding Re-use in a Digital Repository.” One of the aspects that we explored in the paper was the issues we have encountered with the representation and description of digital objects and their relationships to other objects, which are natural growing pains associated with Institutional Repositories’ (IRs) transitions to accepting datasets in a variety of formats from multiple disciplines, in addition to their traditional publication/manuscript text file collections.

As a research data management librarian, I endeavor to encourage the researchers at my institution to store, back up, and archive their data appropriately (and in the most appropriate formats), describe their data and digital scholarship products sufficiently, and deposit these items responsibly into appropriate data-sharing repositories, which in some cases is our IR. The pursuit and collection of scientific datasets in our IR, the Brown Digital Repository (BDR), is a relatively recent initiative for us; the BDR is and has been an established digital archive for special collections and digital projects for the Digital Humanities. So after a year of depositing science-related data and collaborating with Ann Caldwell, our Metadata Librarian, and Joseph Rhoads, our Repository Manager, and the Library’s programmers, the ACRLNE paper allowed Elli, Ann, Joseph and I an opportunity to reflect and evaluate these objects and their representation in the BDR, and to think of ways that we can improve their discovery, access, use and possible re-use.

Here is a sample of the description/structure issues that we are currently working on:

Connecting a Dataset to a Publication

Last year we were approached by two researchers interested in archiving their dataset in the BDR to comply with the new PLOS data access policy, which requires authors to submit a data availability statement with a permanent link to publicly access their data as a requirement for submitting a manuscript and for publication.

In this instance we deposited the data into the BDR and provided the authors with a DOI; however, upon review of the record, I noticed we had initially failed to describe the relationship in the metadata describing the PLOS article related to the dataset. While the reader of the online PLOS article can easily link to the BDR and the dataset, the BDR user was unable to access the PLOS article from the record or discover it in the metadata. So now we are working on making this relationship apparent in the metadata (MODS) and the structure (RDF).

This model (publication + minimal dataset underlying publication or publication + supplementary dataset) will be an increasingly important object model as publishers begin to require the deposit of the datasets underlying published articles, for peer review and/or public access, and also for when we begin to archive datasets related to dissertations.

BDR link:

https://repository.library.brown.edu/studio/item/bdr:384192/

Publication link:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112544

Exposing Relationships for Data Deposited in Another Repository with Related Digital Objects

A pair of researchers asked us to mint them a DOI for RNA sequence files they had deposited in a NCBI database so that they could use to it cite the location of their data in their manuscripts, and they had some experiment-level metadata that they also wanted to archive in the BDR that could facilitate the possible re-use and repurposing of their dataset.

While the DOI resolves to the data’s location in the Sequence Read Archive (SRA), the DOI and the location were not currently exposed on the record. In order to facilitate re-use, we added to to the MODs ( (Metadata Object Description Schema) the “Related Item: Is Referenced By” element and attribute to describe the two publications that are based on the dataset.

Although this information was now described in the MODS, the structure of the BDR at that time prevented the exposure of these relations on the record. Thus, users of the BDR could not easily ascertain that this record is for a dataset that is not in the BDR, and they could not easily link out to the data’s actual location, or to the two publications based on the dataset.

BDR link: https://repository.library.brown.edu/studio/item/bdr:402997/

Exposing Experiment-Level Metadata

We also struggled with ways to help the researchers make their experiment-level metadata visible, which derived from a disciplinary data standard for describing RNA files. We found the only way for us to expose this information on the record with the structure at that time was in the MODS using the “Note” element and “type” attribute to record the experimental-level metadata descriptors such as cell origin and instrument, which essentially made it a free-text field. But while this exposed the information on the record, it prevents the BDR user from seeing the actual metadata descriptors, so you could see “homo sapiens” but you could not see “organism”, and you can see “melanocyte” but you cannot see “cell type”, etc.

In another case, we had a researcher deposit the code and datasets for a dynamic visualization of a computational analysis. Among its related objects were a pre-print in ArXiv, a PLOS article, and a copy of the code in a code versioning repository, Bitbucket. While we did describe these relationships in the MODs, these again were not exposed on the record view, so users could not find the code, pre-print or the published article. Instead of exposing the experiment-level metadata in the notes of the record or making it an attached supplementary file, this researcher preferred instead to code this information directly into a html page and the visualization itself and add the location of the code into his abstract.

BDR link: https://repository.library.brown.edu/studio/item/bdr:351764/

Exposing Relationships in the Structure

This same researcher and a colleague later asked us to deposit a .pdf file of an abstract and a .pdf file of a conference poster. He wanted us to mint a DOI for their poster that they could cite in the abstract hand-out he would give to colleagues so that these conference attendees could find his poster online after the conference. This gave us the perfect opportunity to work with the Library’s programmers to create a way to expose these relations in the RDF statement. Now on the records, under Relations, is the header “Parent” for the poster and “Child” for the abstract, but the BDR programmers are changing this to view as  “Is Part Of” with the link to the poster for the parent object, and on the poster record, under Relations is the header “Has Parts” with a link to the related abstract.

BDR link: https://repository.library.brown.edu/studio/item/bdr:414548/

Proposed view: https://repository.library.brown.edu/studio/preview/item/bdr:414548/

But while this structure change is in the right direction and will temporarily help with items that have this parent/child dichotomy, we still have to work out how we will make other, more complex relationships visible, where it is less apparent which object is the parent and which is the child.

A Work in Progress…

So after a year open for business as a science data repository, we have found that we have had an inconsistent approach to describing related objects and that our structure was a barrier for making these relationships visible; BDR users currently can not easily discover an item and discover and link out to its related objects’ locations. While some object models we can predict will be basically standard (publication + related dataset), the RNA and visualization data ingest examples above have taught us that we have to be prepared for more “unicorn” situations, scenarios where the depositors will have to communicate and collaborate with us to describe and represent their digital scholarship in the way that they feel best meets their needs and the needs of the BDR users.

Science Boot Camp Registration is open!

e-Science Portal Blog - Tue, 04/28/2015 - 14:25

Registration is now open for the 2015 New England Science Boot Camp! For further information and to register, visit the Science Boot Camp 2015 Lib Guide at http://classguides.lib.uconn.edu/SBC2015

This year’s Science Boot Camp will be held at Bowdoin College  June 17-19th. Now in its seventh year, Science Boot Camp provides a fun and casual setting where New England science faculty present educational sessions on their respective science domains to librarians.  Science topics for this year’s boot camp include Cognitive Neuroscience, Marine Science, and Ornithology. There will also be a special evening presentation, “History of Diabetes” on Wednesday, June 17th. The Capstone on Friday June 19th will feature a hands-on session, “Writing and Reviewing Data Management Plans.”

Prior to the official start of the boot camp program, Science Boot Campers can opt to take tours of the Bowdoin College Museum of Art and/or the George J. Mitchell Department of Special Collections & Archives, on Wednesday morning June 17th. Science Boot Campers are also invited to participate in an optional tour of Bowdoin’s Coastal Studies Center Marine Lab on Orr’s Island on Friday afternoon after the conclusion of Science Boot Camp. (SBC registrants are required to indicate if they will participate in one or more of these activities on the SBC registration form<https://webapps.umassd.edu/events/library/?ek=539>.

Science Boot Camp provides librarians with valuable continuing education at a low cost, and offers three options for attendees-full registration with overnight lodging, commuter registration, or a one day registration option.

This year, Bowdoin College is offering overnight accommodations for Tuesday June 16th and/or Friday June 19th, at additional cost to campers. (Campers who would like to stay Tuesday and/or Friday evening will pay a separate fee and pay directly to Bowdoin College. Details about this option can be found at http://classguides.lib.uconn.edu/content.php?pid=665848&sid=5513558 ) Getting to Bowdoin is easy by car, bus, or the Amtrak Downeaster train. For further details see http://www.bowdoin.edu/about/visiting/directions.shtml

If you’ve never been to Science Boot Camp, visit the e-Science Portal’s Science Boot Camp page at http://esciencelibrary.umassmed.edu/science_bootcamp where you’ll find descriptions, links to past SBC LibGuides, and links to SBC videos!

Are you curious about what you can expect to learn at Science Boot Camp 2015? Here are the learning objectives for the 2015 Science Boot Camp science and Capstone sessions:

For each of the focus topics covered at Science Boot Camp’s science sessions, Science Boot Campers will be able to:

Explain the structure of the field and its foundational ideas

  • Understand and be able to use terminologies for the field
  • Identify the big questions that this field is exploring
  • Discuss new directions for research in this field
  • Discuss what questions research in this field is addressing
  • Understand how research is conducted, what instrumentation is used, and how data is captured
  • Identify how researchers share information within their fields beyond publications
  • Share insights into what current research in the field is discovering and implications of these discoveries
  • Share insights into how researchers in specific fields collaborate with librarian subject specialists now and how they might collaborate in the future.
  • Identify new ways that librarians can support their research communities

Additionally, following the “Hands-On Writing and Reviewing Data Management Plans “Capstone, Science Boot Campers will be able to:

  •  Understand the how actions by the OSTP and funders have led to requirements for data  management plans
  •  Write a basic data management plan based on an actual research case
  •  Identify gaps in a data management plan requiring additional information from researcher(s)
  • Review and critique a data management plan written by others
  • Begin to understand the importance of understanding disciplinary terminology in writing or reviewing a data management plan

 

 

 

 

Heather Coates is building a data management curriculum at her institution, one partnership at a time

e-Science Portal Blog - Mon, 04/27/2015 - 08:09

In the four short years since joining IUPUI’s University Library, Heather Coates has built a data management program from scratch, forged partnerships across campus (and throughout the larger university), and also served the school’s scholarly communication needs overall.

What can we learn from her experiences?

Recently, I emailed with Heather to learn more about her recent successful data management course, offered in partnership with her campus’ Clinical and Translational Sciences Institute (CTSI), as well as how she’s managed to balance her data management outreach and education efforts with her many other duties–an experience that many of us understand all too well!

Tell me about your current role at IUPUI.

My title (Digital Scholarship & Data Management Librarian) is a bit of a mouthful, but is a fairly accurate description of what I do. My primary role is to provide data services to the IUPUI campus. As a part of the Center for Digital Scholarship, I also work with Scholarly Outreach Librarian, Scholarly Communication Librarian, and Digital Humanities Librarian to provide a range of research support services. Primarily, this includes education and advocacy for altmetrics and other sources of evidence demonstrating research impact.

I am also the liaison the Fairbanks School of Public Health. All Center librarians, myself included, also hold liaison roles to keep us connected to information literacy and reference services. It can be really challenging to balance all of these roles, but each of these roles helps me to do the others better.

You didn’t start out as a librarian, correct? What drew you to data librarianship?

It was completely serendipitous. I finished the MLIS program expecting to become a medical librarian or subject librarian in psychology. Finding a job for my husband and I in the same city after the housing crash proved to be difficult. So when I saw the posting for my current position, I pounced. The more I learned about data curation, the more interested I became. While preparing my interview presentation, suddenly all the hard learned lessons from my years in research and coursework in health informatics made sense. It was really exciting to finally feel like I had found my niche.

You recently presented at ACRL 2015 on the data management class you created at IUPUI in partnership with the NIH-funded Indiana Clinical and Translational Sciences Institute (CTSI). Tell me about that course.

I came across the Data Management Team at the Indiana CTSI during my environmental scan of the campus. I can’t remember exactly how I stumbled across the field of clinical data management, but once I was aware of their expertise, I had to reach out.

The team leader, Bob Davis, was very open and willing to talk and eager to hear that the Library was interested in providing data management training. Bob’s team is very experienced, but they are funded on a cost-recovery model. Although they offer several workshops, they have very little time to provide the in-depth training that I wanted to develop. They also have a much deeper and narrow focus on research data management than I could take.

My goal was to develop a broad curriculum that would meet the needs of researchers in the social, life, and physical sciences, as well as the health sciences. Bob and his team graciously shared their training materials and expertise, which hugely shaped the program. They also attended the pilot lab and provided very helpful feedback that continues to shape how we offer training. Honing in on the instructional design has been a combination of applying the evidence and trial and error, but their perspective on the balance of content has been instrumental.

What was the most successful aspect of that course?

Attendees really enjoyed the discussions, especially with researchers from other disciplines. The activity that has been the most popular is the data outcomes mapping exercise, even though most novice researchers struggle with it.

What parts of the course would you change going forward?

Oh, so many things! There is so much room for improvement, especially the instructional design and delivery. I’m still figuring out how to talk less, let the students lead the discussion, and get them more engaged with the activities.

Why did you decide to offer the course outside of the libraries instead of as a library-based workshop series?

Although we have a strong liaison program, many of the strong relationships between the library and faculty across campus are based on instruction and collection development. I felt that it was really important to acknowledge that no one person or group can know everything necessary to teach data management or provide data services. So building relationships with other research support units across campus has been a consistent effort since I started. Research takes a village!

You founded IUPUI’s library data management service in 2011. What’s been your biggest success (and your biggest challenge) since then?

The biggest challenge has been getting the attention of our faculty, staff, and students long enough to tell them what the library has to offer. No answers to that one yet, except that word of mouth is powerful. So I focus on making one connection at a time.

For librarians just getting started on designing and offering data management workshops, what resources would you recommend?

Many of us are the only data librarian on our campuses, so building an external network outside your institution is crucial. The institutional network is necessary to get your job done; the external network is really important for peer support. It’s so nice to have a group of dedicated, brilliant peers who care as much about data as I do!

For more information on Heather’s work, visit her blog, the IUPUI Data Services website, and follow Heather on Twitter. You can also find copies of much of her formally published work on the IUPUIScholarWorks repository.

Highlighting Resources from the New England e-Science Program

e-Science Portal Blog - Fri, 04/24/2015 - 17:48

Submitted by Donna Kafel, Project Coordinator for the E-Science Portal and New England e-Science Program

Staying on top of conference proceedings is challenging, whether you’re physically attending a conference, or wistfully following attendees’ tweets and links to presentations from afar.  The number of conferences, symposia, workshops, camps, and national conference sessions featuring RDM related topics has surged, and keeping abreast of all the great output from these events is a challenge.

Over the years, the New England e-Science Program team, based at the Lamar Soutter Library, University of Massachusetts Medical School, has made a concerted effort to capture the rich content from its two key conferences: the annual University of Massachusetts and New England Area Librarian e-Science Symposium and the New England Science Boot Camp. The e-Science Symposium conference pages include detailed agenda, links to presentation slides, and posters. Recordings of presentations for the past three e-Science Symposia are available for viewing on the UMass Medical School/New England Area Librarians e-Science Symposium YouTube Channel.  The Science Boot Camp page on the e-Science Portal is one of the most heavily trafficked content areas of the portal; featuring descriptions of each of the seven NE Science Boot Camps, LibGuides and videos of Science Boot Camp presentations.

ACRL online course: What you need to know about writing data management plans

e-Science Portal Blog - Thu, 04/23/2015 - 10:54

Registration is open for an upcoming ACRL e-Learning online course, “What You Need to Know about Writing Data Management Plansthat will be offered April 27-May 15, 2015.

 

 

Big Data and the Collaborative Web Shaping NIH’s Vision and Future Programs

e-Science Portal Blog - Mon, 04/13/2015 - 14:48

Submitted by guest contributor, Katie Houk, Health & Life Sciences Librarian, San Diego State University

I’d like to recap and present a few of my thoughts on the first presentation of the 7th Annual e-Science Symposium, but first I need to congratulate the Lamar Soutter Library at the University of Massachusetts Medical School, the National Network of Libraries of Medicine New England Region, and the Boston Library Consortium for the best Symposium that they have held so far. Now in its seventh iteration, the symposium presented a cohesive and interesting schedule with excellent speakers, a range of posters, and much food for thought on the state of data and its management in the sciences.

Our first speaker of the morning was Dr. Philip Bourne, newly appointed Associate Director for Data Science at the National Institutes of Health. Dr. Bourne was kind enough to Skype in, quite early in the morning, from California in order to present to our group. Dr Bourne’s talk covered many topics on his mind, and even included some of his personal ideas about the direction of the National Library of Medicine after the current director retires. He first started by recommending two books which are currently shaping his thoughts and outlook on technology and data: The Second Machine Age and BOLD. Some of his more interesting statements, in my personal opinion, were comments on the NIH creating a genomic data sharing policy, and that soon data sharing plans would be required for all awards from the NIH, not just those over a certain amount. They are currently looking at how to enforce these data management plans, and are concerned that DMPs are not machine readable. The NIH is wondering if they should start thinking about standardizing the plans in some way, which could also aid in them becoming machine-readable in the future. Dr. Bourne is also very interested in legitimizing data as a form of scholarship. He gave an anecdote that he has a paper that has been cited by over 19,000 people, but he knows nobody has read it because it’s about data. However, nobody is citing the actual data because data citation is not yet standardized, nor considered as prestigious as citing someone’s published paper.

Dr. Bourne spent the bulk of his talk speaking about the Big Data to Knowledge (BD2K) program at the NIH. This program was created after he came into his new position and it focuses mainly on accelerating discovery and making replicable experiments. It is looking to bring together communities, policies, and infrastructure in order to create research and outcomes that are efficient, sustainable, and collaborative. While Dr. Bourne realizes that in individual labs they’re not as concerned with big policies, I believe he hopes that successes like the ENIGMA project will show the usefulness of standardized protocols and homogenized data in bringing together data from separate sites to uncover population-level discoveries. Dr. Bourne spoke many times about the community involvement aspect of building policies and new programs at the NIH. He pays homage to the idea of open access and using openness and community involvement to build better programs, services, and to make more health discoveries. Bourne’s big ideas for future changes at the National Library of Medicine include the library being more open and more collaborative with the community to develop programs. He feels it should be more effective in it’s use of open access materials, and should function more like a digital public library. I find it interesting that when I explored some some of the projects in the Big Data to Knowledge program, many of them are librarian-ish in nature; in that they are mainly discussing solutions to the struggles of how to define, describe, collect, and organize research data into something that is discoverable, useful, and possibly reusable.

I wonder if Dr. Bourne’s vision for the projects occurring within the NIH’s BD2K includes librarians as part of the community involved in these projects? It seems unlikely to me, since when asked how or if librarians could benefit from and contribute to the new focus of the Data Science section, Dr. Bourne was at a bit of a loss. This conundrum is nothing new to our profession, but it is my opinion that now is an opportune moment for intrepid librarians and database programmers to be coming forward and pointing out that our professional expertise in organizing, describing and accessing knowledge is an excellent reason to be involved in many of the larger projects occurring at the NIH.

Part of the BD2K program is the development of something called the Commons, which is essentially a repository hosted in the cloud. For the Commons there will be a set of interoperability guidelines and the design is to force individual companies to create Commons compliant software by providing extra funding to researchers using such products. The unique aspect to the Commons is that it will also provide cloud supercomputing, research tools, and APIs that labs can use to conduct their data research analyses. The third speaker of the morning, Dr Kuo, makes me wonder if the Commons will be as successful as they’re dreaming it will be, mainly because most researchers on the ground are doing such complex work that they need cheap, fast and flexible options that meets their very specific needs. Compliance with outside mandates often increases the cost and decreases the flexibility of products.Perhaps the question is: Is the NIH big enough and powerful enough to force mandates on individual researchers that will be followed and can be enforced?

As later symposium speakers pointed out, many of the ideas that Dr. Bourne touched on have been attempted before, so it will be interesting to see how these projects pan out in the future.

A few of my personal take away ponderings:

  1. Time to read up!

  2. What would more community involvement in developing programs at the NLM look like?

  3. Are librarians being included on the data projects that are essentially concerned with the types of issues we’ve been dealing with as a profession for centuries – just with mostly physical materials so far? Should they be?

  4. Will the Commons be able to provide the affordability and flexibility researchers need to conduct their varied projects?

  5. Will there be a standardized form and a requirement for DMPs to be in xml when the NIH finally mandates that all proposals must include one?

  6. If scientists are being mandated to use standards and think about interoperability, where will they find out about which standards are available and the best to use? (Or rather, where do librarians go to find this out, and how can this scattered information be collected and accessed more efficiently?)

What do you think?

References:

Link 1: http://www.worldcat.org/title/second-machine-age-work-progress-and-prosperity-in-a-time-of-brilliant-technologies/oclc/867423744&referer=brief_results

Link 2:

http://www.worldcat.org/title/bold-how-to-go-big-achieve-success-and-impact-the-world/oclc/897424074&referer=brief_results

ACRL 2015: Sounds familiar

e-Science Portal Blog - Tue, 04/07/2015 - 12:55

This was my first ACRL national conference, but will hopefully not be my last. Attendees really were spoilt for choice at this conference – there were far too many sessions on a wide variety of topics to do justice to them all. ​For this blog post, I thought I’d do a quick recap on a few presentations that stuck with me.

I attended a few sessions related to library and institutional data that were interesting for their analogies to data in the e-science world. Common themes included the notion that a lot of data is collected, but very little is acted upon. Data that is acted upon is mostly related to compliance issues rather than analyzed with an eye toward strategic decision-making. Culture, money, talent, political tensions and time were noted as barriers to putting library and institutional data to best use. There are privacy concerns, as well as uncertainty over how the data may be used. Sounds familiar, right?

For the panel session called Getting Started With Library Value, librarians from five institutions described their strategies for demonstrating library value. Speakers noted the need to focus, prioritize, organize, and simplify; to align efforts carefully with various institutional cycles such as those related to academics, budgets, and evaluation; and to be mindful of the strategic vs. operational tensions that exist within libraries. Granted, none of this was explicitly about e-science per se, but the ideas certainly resonated, and…sounded familiar!

Poster sessions were one highlight of this conference for me. I particularly remember these on topics of potential interest to our blog audience:​

Sprouting STEMs: Brooklyn College Library staff described their Science Information Internship program to expose STEM and health sciences majors to science librarianship.

Nurturing a Data Management Community: This poster detailed the efforts of UIUC’s Research Data Services Interest Group, particularly their meeting series on a range of data-related topics, involving both library staff and external speakers.

If I have a complaint about ACRL it’s that there was so much to see in relatively little time that it felt like I missed out on a lot of sessions I very much wanted to attend. I guess that’s a good problem to have. And of course, it’s always nice to see friends and former colleagues from near and far, including several of our very own e-science portal editors and staff. (Yes, sometimes it takes a conference many miles away to get us in the same room!)​

Hope to see you in Baltimore for ACRL 2017!

An Inside View of the OSTP Memo Responses on Research Data Management

e-Science Portal Blog - Fri, 04/03/2015 - 14:11

Submitted by guest contributor, Jonathan Petters, data management consultant in Johns Hopkins Data Management Services. Prior to Hopkins, Jon was a AAAS Science and Technology Policy Fellow in the Department of Energy’s Office of Science.  Before that he did atmospheric science research.

By now we’ve seen most of the Public Access Plans in response to the February 2013 OSTP Memo “Increasing Access to the Results of Federally Funded Scientific Research”.  I was a little disappointed to see that, though the memo was written more than two years ago, US research funding agencies (in general) will be meeting the minimal requirements specified with respect to digital research data*.  I had hoped the funding agencies would more quickly assist and encourage researchers in changing their data management and sharing behaviors.

Though I’m disappointed I shouldn’t be surprised; I was a science policy fellow in one of these funding agencies for two years.  In addition to learning how funding agencies go about their business, I learned quite a bit about the development of and response to this particular memo.  Considering what I learned as a fellow, it’s not all that surprising we are where we are.  I outline some of the possible reasons below, from my own myopic view.  Let’s call it ‘informed speculation’.

First thing to note: it’s NOT because the federal government is full of lazy folks waiting for retirement.   I have the upmost respect for the government employees I had the pleasure of working with.  The federal government is full of bright and hard-working individuals, and my experiences learning from them are what led to me my current position.

Possible reasons….

(In)ability to track research data

Robust infrastructure to track extramural research data doesn’t always exist.  The value in improved research data management and sharing has been widely considered only recently**.  Funding agencies have been tracking proposals and technical reports comprehensively regarding extramural research for many years, but not necessarily for the data produced by this research.  In some disciplines a funding agency might have detailed knowledge about the research data produced, and for others little to none.

In some cases the inclusion of data management plans within grant and contract proposals is going to be the first time an agency receives information specifically and comprehensively about extramural research data produced through its funding.*** We can’t expect a funding agency to plan to divert resources to new data infrastructure or management when it doesn’t even know what data is where and in what state.

Agencies know something about data produced through intramural research since they directly administer it, and this could explain why more specific digital data guidance is given with respect to the M 13-13 memo.

Program Officers = Reformed Researchers

Funding agencies creating these Public Access Plans are made largely of former (or current) researchers.  Just like academic researchers, these staff members (and administrators) have their personal impressions on the place of research data and data sharing in the research enterprise.  Sure, there are several research disciplines where data management and sharing are valued highly (e.g. astronomy, environmental science, genomics).  However, we shouldn’t expect funding agency staff in disciplines that haven’t to suddenly embrace a whole new view of digital research data, any more than we can expect their funded researchers to.

Advisory Committees

Advisory committee input could provide some important background for Public Access Plan development.  Funding agencies receive trusted guidance on research directions from these advisory committees, made up of leading researchers in their fields.   If these advisory committees don’t push for more data infrastructure or data sharing in a particular research discipline because they think resources are better allocated elsewhere, it’s less likely the respective program officers will be fighting for research data management either.

How the OSTP Memo is/was communicated

After a memo is released, each funding agency affected communicates the memo and its effects through the hierarchy.  In general this communication starts from the top of the organization and trickles down to those program officers several layers below.  As you can imagine, how a memo is communicated throughout the hierarchy (e.g., its importance, its impacts, its directives) helps to create the environment for the agency’s response.   If the purpose of and reasons for focusing on research data management are lost along the way, we could have agency staff members, or supervisors even, who don’t understand “why we’re doing this”****.

Note that this scenario is NOT unique to the government; it describes the kinds of thing that might happen in any large organization, and maybe happens in yours.  Effective communication in a large organization takes time and effort.

Lack of Budget

NO NEW MONEY.  Funding agencies love unfunded mandates as much as we all do.  How does an agency set the level of priority in meeting these new OSTP requirements, with no additional budget and in the midst of many competing priorities germane to their mission?

Put all of these reasons together and we can gain some understanding of why funders here might not have gone as far as we research data management folks would like.  I’m confident we’ll all get there though.  The currency of the research enterprise has been manuscripts for many generations.  Other research products, like data and code, will find their place of importance. Eventually.

Hey, how are impactful executive memos like the February 2013 OSTP Memo get developed anyway?

In this case, a small working group with representatives from the funding agencies and OSTP discussed and hammered out a draft over several months.  In many cases these representatives were digital data and publication experts, who may have communicated these discussions to others in their agency during the drafting process.  This draft was circulated to all the funding agencies (and their administrators) for comment, and each agency circulates the memo through their organization as deemed appropriate.  This circulation is generally wide, ensuring that all relevant agency parties get a chance to weigh in.  These agency comments are then acted upon by OSTP and the memo is finalized and released.

 

*I’m not talking about publications in this blog post.

**It’s probably because of that Internet thing (which might just be a fad anyway).

***NSF is a possible exception since they’ve been gathering data management plans for four years now.  I thought they’d be in the best position to move forward with refined guidelines.

****There’s a great Dilbert cartoon that exemplifies this hierarchical telephone game but I couldn’t find it.

The Diversity of Data Management : Practical Approaches for Health Sciences Librarianship Webcast

e-Science Portal Blog - Thu, 03/19/2015 - 15:04

The Lamar Soutter Library at the University of Massachusetts Medical School in Worcester, MA is hosting a viewing of the MLA webcast, The Diversity of Data Management:  Practical Approaches for Health Sciences Librarianship, on Wednesday, April 22 from 2-3:30 pm.

As noted by the Medical Library Associaion, this webcast is designed to provide health sciences librarians with an introduction to data management, including how data are used within the research landscape, and the current climate around data management in biomedical research. Three librarians working with data management at their institutions will present case studies and examples of products and services they have implemented, and provide strategies for and success stories about what has worked to get data management services up and running at their libraries.

Attending the webcast is free of charge, but space is limited so advance registration is required.  If you would like to register to attend the webcast in Worcester, click here.

HHS responds to the 2013 OSTP Memo: NIH and Data Management Plans

e-Science Portal Blog - Tue, 03/10/2015 - 17:32

Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, dab2058@med.cornell.edu

In response to the Office of Science and Technology Policy (OSTP) 2013 memo regarding public access to federally funded research, five Health and Human Services agencies released their long awaited implementation plans at the end of February 2015. The OSTP memo, released two years ago in February 2013, instructed federal agencies with research and development budgets of (or exceeding) $100 million to develop strategies to make the results of federally funded research freely available to the public within a year of publication; this directive includes research data as a research result to be shared with the public. The recent updates came from the HHS’s National Institutes of Health (NIH), Centers for Disease Control and Prevention (CDC), Food and Drug Administration (FDA), Agency for Healthcare Research and Quality (AHRQ), and Office of the Assistant Secretary for Preparedness and Response (ASPR), whose plans all address scientific publications and research data with corresponding discovery and access points in PubMed Central and eventually healthdata.gov. However, for this post I will primarily focus in on NIH’s “evolving” data policies and point out the RFI that librarians can contribute to to help shape that evolving process.

For the last few years, I have regularly heard statements from librarians and others that seem to equate the NSF General Data Management Plan Policy with the NIH Data Sharing Policy and Public Access Policy. However, these funder requirements are drastically different both in implementation and result, and the above-described announcements make this all the more clear. Namely, the NSF has robust requirements for all researchers to submit Data Management Plans as part of their grant applications, where the NIH does not. Rather, the NIH requires public access to funded manuscripts, as well as a statement addressing how/if data will or will not be shared in a section of the agency’s grant applications—this second requirement only applies to researchers requesting $500,000 or more in direct costs in funding from NIH for research for any one year. The NIH does not require a formal DMP though, nor is there any process in place by which the NIH ensures that data is actually being shared by the researchers that they fund, though data sharing is actively encouraged. I have personally found that this situation has made it a challenge to illustrate to NIH funded researchers the importance of writing a DMP—when the funder is not asking for more robust planning, it can be difficult (though not impossible) to convince researchers to put in the necessary effort to thoroughly plan.

The NIH’s recently released announcements responding to the OSTP Memo, make very few updates in regard to Data Management Plans, as the HSS agencies see data policies as “evolving” and recognize that much of the agencies’ funded research data resides externally to the agencies themselves. As of right now, HHS has no shared repository for deposit of HHS agencies’ research data or catalog of associated metadata. The plan presented notes that an internal HHS Enterprise Data Inventory will serve as the catalog for all HHS data products and will eventually be linked to HealthData.gov. The NIH announcement did however specifically note the following in its “Further Steps Under Consideration” section on Data Management Plans:

“NIH is supporting an Institute of Medicine study of clinical trial data sharing… In an interim report on this topic, the IOM noted that a cultural change has occurred in discussions about clinical data sharing. Rather than exploring whether it should occur, the focus is on how it should be accomplished”

“NIH will explore the development of policies to require NIH-funded researchers to make the data underlying the conclusions of peer-reviewed scientific research publications freely available in public repositories at the time of publication in machine readable formats… NIH is taking steps to ensure all NIH-funded researchers develop data management plans…. As a first step, the 2003 NIH Data Sharing Policy will be modified to require that all NIH-funded researchers develop data management plans.”

(http://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf)

Therefore, much of the recently released NIH response gives vague reference to what is being planned, but little detail on execution of those plans—specifically, what DMP requirements will be executed and when that execution is anticipated to occur. The NIH stance seems to be defined thusly: “NIH will determine the additional steps needed to ensure that the merits of digital data management plans are considered during the peer review process for extramural research grants and contracts” yet much is still unclear regarding what is to be expected.

Librarians working in biomedical research environments should continue to advocate that researchers write robust DMPs regardless of whether or not they are a requirement of their funders and should be sure to be aware of the following regarding NIH requirements:

NIH Data Sharing Policy 

The new sharing policy for genomic data

The separate data policies by NIH institute 

The list of the NIH’s preferred data sharing repositories 

And just for good measure here’s the NIH data sharing FAQ

Also useful is the “data sharing workbook

Librarians can also refer researchers to DMP examples in the Biology like those gathered by the New England Collaborative Data Management Curriculum

Furthermore, I encourage librarians to consider contributing the following Request for Information to help shape NIH data resources developed through the National Library of Medicine:

—–

The National Library of Medicine needs input on the Library’s future in a Big Data world!

This is your chance to influence how some of the NIH’s most prominent data and information resources will be developed and envisioned in the future! 

Respond to the RFI at: www.nlm.gov/RFI

Deadline: 3/13

Topic: NLM seeks input regarding the strategic vision for the NLM to ensure that it remains an international leader in biomedical data and health information. In particular, comments are being sought regarding the current value of and future need for NLM programs, resources, research and training efforts and services (e.g., databases, software, collections). Your comments can include but are not limited to the following topics:

1 – Current NLM elements that are of the most, or least, value to the research community (including biomedical, clinical, behavioral, health services, public health and historical researchers) and future capabilities that will be needed to support evolving scientific and technological activities and needs.

2 – Current NLM elements that are of the most, or least, value to health professionals (e.g., those working in health care, emergency response, toxicology, environmental health and public health) and future capabilities that will be needed to enable health professionals to integrate data and knowledge from biomedical research into effective practice.

3 – Current NLM elements that are of most, or least, value to patients and the public (including students, teachers and the media) and future capabilities that will be needed to ensure a trusted source for rapid dissemination of health knowledge into the public domain.

4 – Current NLM elements that are of most, or least, value to other libraries, publishers, organizations, companies and individuals who use NLM data, software tools and systems in developing and providing value-added or complementary services and products and future capabilities that would facilitate the development of products and services that make use of NLM resources.

5 – How NLM could be better positioned to help address the broader and growing challenges associated with: Biomedical informatics, “big data” and data science; Electronic health records; Digital publications; or Other emerging challenges/elements warranting special consideration.

IDCC 15 – Part 2 (It’s a big conference)

e-Science Portal Blog - Mon, 03/02/2015 - 11:38

Last week in her blog post, Margaret discussed the twitter feed from the International Data Curation Conference (IDCC) that took place on Feb 9th to the 12th. I was fortunate enough to be able to attend and participate this year, and as it is a premier event for data professionals, I’d like to add a bit more about the conference.

The theme this year was “Ten years back, ten years forward: achievements, lessons and the future for digital curation”. Tony Hey, formerly of Microsoft Research and now a Fellow at the University of Washington, was the opening Keynote.  He did a very nice job of illustrating how far we have come in the past ten years. Data management and curation are now recognized as important issues and discussed in high-profile venues like Science and Nature.  However, he also noted that we still have some very serious problems to address. Funding for curation is often based locally, but use of digital data is global. More and more data repositories and tools are coming online, but support for these initiatives are still quite fragile and we have lost some important resources (RIP Arts & Humanities Data Service).

This tension between how far we have come vs. how far we have yet to go was echoed in a panel session titled “Why is it taking so long?” moderated by Carly Strasser from DataCite. Some of the panelists pointed to a lack of incentives, infrastructure and support as barriers to progress. However, others noted that actually quite a lot of progress had been made when one considers the scope of the changes in culture and practice that we are championing.

Presentations on Data Education struck a similar tone. Liz Lyon, from the School of Information Studies at the University of Pittsburgh, noted that roles for Data Professionals are becoming more prominent and defined, but the educational path to prepare oneself to perform these roles is still unclear. iSchools at Pitt and the University of North Carolina, whose program was described by Helen Tibbo, are seeking to position themselves as the places to fill this need.

Though awareness of curation has increased, we still have a ways to go in training academics in curation.  Research done by Daisy Abbott from the University of Glasgow demonstrated a gap between the perception among graduate students that curating their work is important with their reporting that they lack the expertise to curate their work effectively. Fortunately, we have Aleksandra Pawlik and others from the Software Sustainability Institute offering Data Carpentry workshops to help raise data literacy levels of researchers.

The program with presentation slides is available on the IDCC15 website, and the papers will soon be published in the International Journal of Digital Curation. The location of IDCC16 has yet to be announced, but I highly recommend attending if you get the chance.

IDCC15 – I Couldn’t Go But I Followed on Twitter

e-Science Portal Blog - Fri, 02/20/2015 - 10:58

I enjoy going to conferences. I love learning new things and getting new ideas.  I really love the way I’m inspired by the people I meet. But, I can’t go to every conference. Like most people, my university library budget is limited and my own budget is limited. However, as more people in libraries and data take to Twitter and other social media, I can go to conferences vicariously.

 From February 9-12 I was at the 10th International Data Curation Conference  in London, England.  While I wish I had been there, it is possible I would have been so tempted by the sights of London that I might have skipped the meeting.

There is a Storify of the conference available if you want to have a look at all the events and photos and comments. Watching the #idcc15 feed each day made me envious but also excited, as I read about the successes and new ideas that were being discussed during the various programs. Great morning coffee and lunchtime reading. A few highlights you might want to check out:

While there are differences between US and UK regulations, we can learn from programs that work at any institution. Presentations by Imperial College London, Oxford Brookes University, and University of Edinburgh are summarized here, with links to some good resources.

It is also helpful to learn from the researcher’s viewpoint. Purdue’s Data Curation Profiles were the focus of one talk that dealt with the Technology Acceptance Model. And the second talk examined if research supervisors were prepared to provide advice and guidance. Slides and papers for both talks are linked from this summary.

 The Edinburgh group, mentioned above, has a great blog and a couple of posts there talk about IDCC15 covering the first day and another post looking at how the 80/20 rule applies to RDM tools  (if you haven’t heard about the 80/20 rule, also know as the Pareto principle, check out the Wikipedia article)

 A useful Storify covers RDM training for librarians . There are slides embedded in the page, so have a look at the various curricula that were presented.

While we focus on eScience here at the portal, there are also data things going on in other subjects.  If you’ve always wanted to learn a bit about digital humanities, try this video, ”The stuff we forget: Digital Humanities, digital data, and the academic cycle” by Melissa Terras, Director of University College London Centre for Digital Humanities

 This final blog post  recommendation gives you an idea of some of the other subjects covered in the meeting with  links to the talks

By the way, I use TweetDeck to keep track of multiple things on Twitter.  There are some basic instructions here  I have the regular stream of people and organizations I follow in the first column, and after that I have columns for hashtags I’m interested in, such as the #idcc15 label for meeting tweets or #medlibs for medical librarians.  When conferences are over, and I have favorited the tweets I want to follow up on later, I can delete the column. Favorites is another column in my TweetDeck.

7 Recommended Resources for E-Science Newbies

e-Science Portal Blog - Tue, 02/17/2015 - 19:52

Submitted by Donna Kafel, Project Coordinator for the e-Science Portal and the New England e-Science Program for Librarians.

During a recent meeting of the e-Science Portal’s Editorial Board, portal editors suggested that we create a downloadable document, perhaps titled “An Introduction to e-Science” that would provide an annotated list of the best overviews and introductory resources for librarians and library students new to the concept of e-Science and library based data services.  The e-Science Portal team  thought this was a great idea and we have it on our action item list for after the portal redesign is completed this spring.

In the meantime, there are a lot of e-Science newbies out there right now who are at a loss as to where to begin, and who may like some of this information a little sooner.  Looking at all the content packed into library guides on data management, hundreds of journal articles, and data webinars can be a bit overwhelming for those just starting out. Here are seven resources that can help newbies start out on the road to figuring out what is meant by the term e-Science and  how it impacts scholarly communication, library roles in e-Science, the structure of the scientific research environment, data types and data management.

1.  The Fourth Paradigm:  Don’t be intimidated, I’m not recommending that people read the entire book in one sitting! (But it’s worth going back to read individual chapters).  The Fourth Paradigm’s Foreword and the first chapter “Jim Gray on eScience:  A Transformed Scientific Method”  nicely illustrate how the integration of computers and evolving technologies have revolutionized the way science is conducted.

2.  The e-Science Thesaurus is a great place for Newbies to learn terms and concepts, and related  references.  Included in some of the entries, are interviews with librarians who are actively engaged in e-Science (for some interesting interviews, check out Data Curation Profiles Toolkit, Implementing a Data Sharing/Management Policy  and Informationist)

3.  What is e-Science and How Should it be Managed? :  captures the essence of e-science, critical roles for librarians, and the importance of open data sharing.

4.  A nice overview of e-Science and roles for librarians:

a)      Cyberinfrastructure, Data, and Libraries, Part 1. A Cyberinfrastructure Primer for Librarians (2007) – Part one of a primer for librarians on the major issues and terminologies of e-Science.

b)      Cyberinfrastructure, Data, and Libraries, Part 2 – Part two: the role of libraries in data management and how librarians can participate in the downstream and upstream phases of the research cycle.

5.  Data Types (4 min YouTube video)—describes the diverse entities that come under the umbrella term data and the different ways data is captured.

6.  A Day in the Life of an Academic Researcher Part 1 (7 minute YouTube video) and A Day in the Life of an Academic Researcher Part 2 (5 minute YouTube video) explains the research environment and the different roles played by members of a research team.

7.  The Journal of eScience Librarianship (JeSLIB) :  specifically dedicated to the advancement of e-Science librarianship, JeSLIB  includes peer-reviewed research  and “e-Science in Action” articles on topics such as research data management, librarians embedded on research teams, data services, data curation, and data sharing and re-use.

Glitter on the Highway: Data on the Website

e-Science Portal Blog - Wed, 02/11/2015 - 12:09

By Andrew Creamer, Scientific Data Management Specialist, Brown University

Glitter on the mattress
Glitter on the highway
Glitter on the front porch
Glitter on the hallway 
Love Shack, The B-52s, Pierson, Schneider, Strickland, EMI (1989).

Recently I was reading through the drafts of the Data Management Plans (DMPs) and Broader Impacts sections that were submitted with faculty NSF proposals through our data management plan service for 2014-2015. As I reviewed these data management plans, one of the commonalities I noticed was the ubiquity of statements that data would be linked from the project website or personal website. Listed as either a tool for dissemination or post-project archiving and access, or in some cases both, there was data on the website. In a few cases data on the website was conspicuously the only option listed for dissemination or post-project archiving. Most often it was mentioned nested in among other options; for example, for dissemination, the investigators would say they would disseminate the data by sharing it on their personal or project websites, depositing it in some type of data sharing repository, and publishing the results in academic journals and presenting these at scientific meetings. As I looked over these drafts I could see where in each occurrence I had marked a comment asking the investigators for more information about what they meant by putting data on a personal or project website and to please have a conversation with me regarding this option.

The opaque “data on the website issue” comes up in almost every conversation I have had with faculty using our DMP service: “So, you say here that you have a website. How exactly are you storing and making your data available on your website? Who is responsible for doing and maintaining this, etc.” This conversation can go many ways, of course. While some faculty mean that they are depositing the data into a repository and have a persistent link that they will place on their personal or project website in a citation that will link out to the data, some faculty mean that they have a personal server, and in some alarming cases, a web server, where they will place and link to data on their website. While the former intention also leads one down a line of important questioning about suitability and sustainability, such as which repository, what kind of persistent link, etc., it is the latter scenario, of course, that concerns us research data management librarians the most.

In their article published in PLOS ONE last summer, How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers, Pepe et al. (2014) provided evidence that we can use in conversations with investigators about considering alternatives for storing data on their web or personal servers. Their findings showed putting data on a personal or project-based website was the third and fourth most popular practice for data sharing practices among astronomers after emailing or placing data on a FTP-style site. Then they looked through the external links to data published in a defined period of astronomy literature and found:

“This exploratory analysis reveals three key findings. First, since the inception of the web in the early 1990′s, astronomers have increasingly used links in articles to cite datasets and other resources which do not fit in the traditional referencing schemes for bibliographic materials. Second, as for nearly every resource on the web, availability of linked material decays with time: old links to astronomical materials are more likely to be broken than more recent ones. Third, links to “personal datasets”, i.e., links to potential data hosted on astronomers’ personal websites, become unreachable much faster than links to curated “institutional datasets”. (Pepe et. al 2014)

The practice of placing data on a website may be entrenched in the data sharing practices of certain scientific communities, but as research data management librarians we need to be sure that we do not become numb to its ubiquity; instead we must continue to question the researchers about what they mean and list ways that we can still help to make data accessible from their website but mitigate the myriad issues related to storing data on web servers or personal servers, e.g., lack of back up, persistent identifiers, no long-term preservation strategy, lack of sufficient metadata, link rot, diminished discoverability and the access risks when only one person is the sole individual responsible for making data accessible.

On the publisher side, last spring PLOS added this text to their Data Availability Policy: “Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.” This one sentence has also been helping me in the endeavor to dissuade researchers from stating that if another researcher wants or needs access to their data, then he or she can just contact them as the sole means of data access or access their data on their personal website as sole means of data dissemination. So let us hope that research funders will also begin pushing back on researchers that want to use their personal or project websites and their personal and web servers as the sole means of data dissemination or storage location for post-project access.

Citation: Pepe A, Goodman A, Muench A, Crosas M, Erdmann C (2014) How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers. PLoS ONE 9(8): e104798. doi:10.1371/journal.pone.0104798

A gentle introduction to Docker for reproducible research

e-Science Portal Blog - Thu, 01/29/2015 - 16:27

Submitted by guest contributor Stacy Konkiel, Director of Marketing & Research, Impactstory, stacy.konkiel@gmail.com.

By now, many data management librarians are familiar with the concept of reproducible research. We know why it’s important and how to (theoretically) make it happen (thorough documentation, putting data and code online, writing an excellent Methods section in a journal article, etc).

But if a scientist asked you for a single recommended reading on how to make their computational research reproducible, what would you send them?

I’d suggest “Using docker for reproducible computational publications” by Melissa Gymrek (a Bioinformatics PhD student at Harvard/MIT).

In her post, Gymrek introduces Docker, a “lightweight virtual machine” that allows a researcher to create a complete computing environment, hosted in the cloud, that other researchers can log into to reproduce results using the original researcher’s code and data.

No need to download and install R packages, or to figure out how to make someone else’s code play well with their operating system. Just install Docker, enter a simple line at the command line, and–boom–they’ve got a virtual machine running on their computer that they can log into to reproduce someone else’s findings.

Docker is already popular in the software development world, and is gaining popularity with bioinformaticists and other computational researchers. Learn more about Docker and how it can work for reproducible research on Melissa Gymrek’s blog.

Winter is the perfect time for a virtual conference or webinar!

e-Science Portal Blog - Wed, 01/28/2015 - 15:19

There’s been a flurry of upcoming virtual conferences and webinars springing up and providing educational opportunities while obviating the need for travel in  wintry weather. In a previous post, I had noted the upcoming DataONE webinar series that begins on Feb. 9th with the webinar “Open Data and Science:  Towards Optimizing the Research Process.”

NISO is sponsoring a six hour long (11 am – 5 pm EST) virtual conference on Feb. 18th:  “Scientific Data Management :  Caring for your Institution and its Intellectual Wealth. Hosted by Todd Carpenter, Executive Director of NISO, the program includes speakers from the Dept. of Energy, Emory, Tufts, Oregon State University, UIUC, the Center for Open Science, and the RMap project. The final session will be a roundtable discussion. Program topics for the conference include:

  • Data management practice meets policy
  • Uses for the data management plan
  • Building data management capacity and functionality
  • Citing and curating datasets
  • Connecting datasets with other products of scholarship
  • Changing researchers’ practices
  • Teaching data management techniques

Finally (although I suspect I’ll soon be adding to this snowballing list), Elsevier is sponsoring the webinar “Institutional & Research Repositories:  Characteristics, Relationships and Roles” on Feb. 26th from 11 am-12:15 pm (EST)

 

Call for Participation for Content Editors for the e-Science Portal

Blog: Current Projects - Thu, 12/12/2013 - 11:48

The Editorial Board of the e-Science Portal for New England Librarians is looking for librarians who are passionate about emerging trends in science librarianship and interested in working as part of an editorial team to become Content Editors for the e-Science Portal for New England Librarians. Launched in 2011, the e-Science Portal is a resource for librarians, library students, information professionals, and interested individuals to learn about and discuss:

  • Library roles in e-Science
  • Fundamentals of domain sciences
  • Emerging trends in supporting networked scientific research

Currently the Editorial Board is reorganizing its content and expanding coverage to better serve the information needs of librarians interested in e-Science, new trends in science librarianship and scholarly communication, and ways that libraries are addressing the issues of the networked data age. The e-Science portal is built on a Drupal platform.

Content editors are needed for the following e-Science portal content areas:

  • Data Information Literacy:  resources, courses, information needs of researchers
  • Emerging  Trends & Technologies new roles, emerging technologies, repository tools
  • Scholarly Communication:  publishing data (including peer review, journal policies), sharing, altmetrics, citing data, identifiers, Open Data, Open Science, Open Access
  • Professional Development and Continuing Education:  competencies, courses, e-Science symposia, related professional associations and conferences, recommended websites and blogs

This call for participation is not restricted to New England librarians. Requirements for Content Editor positions include a time commitment of 3 hours per month for the following activities:

·          Identifying, annotating, and posting links to relevant resources on the content area page

·          Reviewing the content page to ensure functioning links and current information

·          Communicating via an e-mail discussion list with other members of the Editorial Board

·          Attending Editorial Board Meetings: while in person attendance at Editorial Board  meetings is preferred, arrangements can be made for Content Editors outside the NE region to attend meetings remotely.

·          Content Editors can refer to the e-Science Portal’s Selection Criteria for guidelines on selecting resources. The e-Science Portal for New England Librarians is funded by the National Network of Libraries of Medicine New England Region.  Stipends will be paid to appointed Content Editors.

For further details about the Content Editor positions, please contact me at Donna.Kafel@umassmed.edu

 

 

 

New England Collaborative Data Management Curriculum is now available

Blog: Current Projects - Tue, 11/12/2013 - 13:40

The Lamar Soutter Library at the University of Massachusetts Medical School is pleased to announce that the New England Collaborative Data Management Curriculum  (NECDMC) is now available!   NECDMC has been designed for teaching research data management to students in the health sciences, sciences and engineering fields, at both the  undergraduate and graduate levels.

NECDMC is divided into a series of  instructional modules   that cover key concepts from the National Science Foundation’s recommendations for data management plans.  Unique to NECDMC is the collection of research cases  that provide disciplinary context and illustrate data management issues in a range of research settings such as a biomedical research lab, a qualitative health study, outpatient clinics, and  an aerospace engineering text lab. These cases make it easier for students in various disciplines to visualize data management activities in the course of day to day research work and to understand the impact of research data management best practices on the success of team research.

Librarians and faculty are welcome to use the lecture content and power point slides in the curriculum’s modules and adapt them to suit the needs of their local audiences, in accordance with the provisions of NECDMC’s Creative Commons Attribution-Share Alike-Noncommercial License.

The curriculum is a collaborative project  stemming from the IMLS funded Frameworks of a Data Management Curriculum  created by librarians from the University of Massachusetts Medical School and Worcester Polytechnic Institute with an IMLS grant.  NECDMC’s lecture content, slides, research cases, data management plans, and activities have been developed by a team of librarians from the University of Massachusetts Medical School, University of Massachusetts Amherst, Tufts, Northeastern and the Marine Biological Laboratory and Woods Hole Oceanographic Institute. Funding for the NECDMC project is provided by the National Network of Libraries of Medicine for New England.

NECDMC staff is now looking for partners to pilot the curriculum to test the effectiveness of the curriculum’s teaching materials to students in different settings and disciplines. If you’re considering using the curriculum materials, consider being a pilot partner!  For more details, please contact me at donna.kafel@umassmed.edu

 

 

 

Syndicate content