Submitted by guest contributor Chris Eaker, graduate student at the University of Tennessee School of Information Sciences.
I’ve been thinking a lot recently about data sharing by scientists, especially since I’m in the midst of applying for jobs in which I will need to make the case to researchers at my institution to share their data. Some disciplines already do a lot of sharing amongst their colleagues, but many do not. A culture shift must happen for data sharing to occur in any real and meaningful way.
I’ve also given a lot of thought to reasons I could use to persuade a scientist to share his or her research data, and there are many quite valid ones. But the thing that keeps me guessing is the flip side of the data sharing coin: data reuse. It won’t matter if a scientist shares his or her research data if nobody is willing to use it in their own research. We not only need to encourage researchers to share their data, but also encourage them to seek out data sets published by other scientists. In fact, my opinion is that using existing data should be their first choice. Not only does it save them the time of not having to collect data, but it also saves money for whomever or whatever organization is funding the research. My sense is that researchers are collecting a lot of similar, if not same, data that other researchers have already collected. Why reinvent the wheel? We’ve come to a point where sharing data is easier than ever with data repositories for all kind of disciplines ready and eager to host data (assuming it meets their requirements).
From my point of view, we must not neglect the reuse side in pursuit of the sharing side. Thus, in conjunction with the reasons why a research should share data, we also need to determine the best reasons why a scientists should give preference to using existing data over collecting new data. Of course, some research projects have to have current data, and that is understandable, but many do not, and it’s probably more than we think that do not.
What are your thoughts on this issue? What are some of the reasons we could use in our interactions with scientists?
Recognizing data’s potential for driving innovation, on May 9th President Obama signed an executive order detailing steps towards making government-held data more accessible to the public and entrepreneurs. Details about the order and steps government agencies are taking to follow it are noted on the White House blog posting Landmark Steps to Liberate Open Data.
Registration is ending soon! Don’t lose your chance to join in the fifth annual Science Boot Camp at UMass Amherst June 12-14th—a unique opportunity where librarians can learn about science in a fun and casual camp-like setting!
Science Boot Camp features three science lecture sessions. Each science lecture includes an overview of a science and examples of current research, presented by expert scientists from around New England. The Science Boot Camp Capstone Session focuses on ideas, skills or innovative projects relevant to librarians.
This year’s Science Boot Camp features the following topics:
· Public Health
· Analytic Chemistry
· How to talk to researchers (Capstone)
· Lightning rounds: opportunity for campers to talk about their work/projects/ideas
Where else but Science Boot Camp do you get this easy-going and easy-on-the-budget opportunity to meet and mix with science, health sciences, and engineering librarians and library students from New England and beyond? We offer flexible opportunities for attending boot camp, including overnight, commuter or one-day options.
For more information and to register check out the 2013 Science Boot Camp guide at http://guides.library.umass.edu/BootCamp2013
To view videos of presentations from past Science Boot Camps, check out the e-Science portal’s Science Boot Camp page.
Submitted by Christopher Erdmann, Head Librarian of the Harvard-Smithsonian Center for Astrophysics at Harvard University.
Ever since I announced the Data Scientist Training for Librarians (#DST4L) course (for more background see the DST4L blog and/or the following Library Journal story), over 100 librarians from across the world have contacted me to ask whether the course is available online or will it be offered again.
Unfortunately, we were never able to stream the training sessions online, mainly because the course was experimental. In the early stages of the course, we also got a taste of how hands-on work was challenging in a virtual environment and ended it fairly quickly.
DST4L will conclude at the end of the month, and at a minimum, the class would like to share our accomplishments and feedback at an open event and answer questions from all that are interested.
Event: Data Scientist Training for Librarians Tells All
Date: June 4
Students will discuss projects which include, but are not limited to, NASA ADS, NY Times, DOE and Internet Archive data, and will share their thoughts on the course, for instance, how starting with OpenRefine is a good idea.
Jennifer Prentice, Dr. Rong Tang (Simmons GSLIS) and I have been hard at work planning an event which we intend to stream online (details TBD). In the meantime, follow our blog as we will be posting group projects and details about live streaming ahead of the event.
The following announcement was posted on behalf of DRYAD.
We are delighted to announce some recent and upcoming developments at Dryad.
First, we recently launched a redesigned website at http://datadryad.org with lots of new content. Some of the highlights include:
- Information about the recently announced submission fees and pricing plans which will become effective Sept. 2013: https://www.datadryad.org/pages/pricing
- An Ideas Forum where you can let us know what features you’d like us to work on next, vote or comment on ideas submitted by others, and check back to see our responses: https://datadryad.uservoice.com
- An Integrated Journals page that helps depositors see which journals are coordinating the submission process with Dryad, figure out which stage in the publication process to submit data for your chosen journal, and more: https://www.datadryad.org/pages/integratedJournals
Second, all Dryad members, prospective members, and interested parties are invited to the first annual membership meetingin Oxford, UK on Friday, May 24. This is part of a series of exciting events in Oxford that week spotlighting trends in scholarly communication with an emphasis on research data, including a Symposium on the Now and Future of Data Publication on Wednesday, May 22nd, and an ORCID Outreach Meeting with a special joint Dryad-ORCID Symposium on Research Attribution on Thursday, May 23rd. Remote attendance will be available. For more information, please see: https://www.datadryad.org/pages/membershipMeeting
As you may be aware, Dryad is a nonprofit organization. Membership is open to a diverse range of stakeholder organizations, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations. To learn more about becoming a member, please see: https://www.datadryad.org/pages/membershipOverview
If you work with a journal, society, publisher, library, funder or other research organization and would like to learn more about how Dryad can help you support the data publishing needs of your researchers, and how you can help support Dryad, please do not hesitate to contact us.
Executive Director, Dryad
The latest issue of the Journal of eScience Librarianship (JeSLIB) focuses on the role of the Informationist or Embedded Librarian in the scientific research process. The theme of this issue comes from the Professional Development Day conference, Embedded with the Scientists: Librarians’ Roles in the Research Process, that was hosted by the Lamar Soutter Library, UMass Medical School, in conjunction with the NN/LM New England (National Network of Libraries of Medicine New England) on Nov. 7, 2012.
Featured in this issue are articles by health science librarians who are working as informationists in NLM Administrative Supplements for Informationist Services in NIH-funded Research Projects , and Chris Shaffer’s (University Librarian and Associate Professor at Oregon Health and Science University) keynote address, The Role of the Library in the Research Enterprise.
Beginning with this latest issue, JeSLIB is experimenting with data from Altmetric to display article level metrics for each of our articles. The Altmetric score is a measure of the attention an article has received online, including social media mentions, news coverage, and online reference manager counts. Contact JESLIB’s editors and let us know what you think!
The Medical Library Association’s Annual Meeting and Exhibition, “One Health: Information in an Interdependent World” is being held in Boston this year. Most of the meeting events will take place at the Hynes Convention Center.
Scanning through the Official Program, there are a number of e-Science related presentations. Below is a bulleted list of them along with a listing of e-Science related posters at poster sessions 1-3. The locations of each session are noted alongside the session name. (HCC=Hynes Convention Center)
Follow the MLA 13 Twitter feed: #mlanet13. For updates, information, and reflections check out the MLA’13 Conference Blog.
Friday, May 3:
8:00 am-12:00 pm
CE Class: Data Curation for Information Professionals (HCC, Room 204) Onsite registration may be available, but course may have reached maximum attendance. For further details see One Health Continuing Education Courses
Sunday May 5:
7-9 AM Sunrise Seminar: The Librarian as a Professional in the Modern Research Organization (HCC, Level Three, Room 311)
1:30-2:30 Poster Session 1 (HCC, Level Two, Hall of Exhibits):
#117 Frameworks for a Data Management Curriculum for Science, Health Sciences, and Engineering Students
#213 One Integrated Health Record: The Librarian’s Role in Linking Patients to their Personal Health Data and Contextual Information
#270 Data Management Needs Assessment
4:30-6 Medical Informatics Section (HCC, Level Two, Room 206)
- Linked Data: Lessons Learned from International Bioinformatics Hubs
- An Introduction to the Semantic Web and Linked Open Data
- Linked Open Data and Biomedical Research: A Survey of Current International Efforts
- Assessment of a User-Centered Ontology to Support the Selection of and Linking among Bioinformatics Resources
Medical Library Education Section (HCC, Level Three, Room 303)
- The Library’s Role in e-Science Programs in Research Universities
2013 National Program Committee: This Just in: Lightning Talks on One Health (HCC, Level Three, Room 312)
- Building a Web Portal of Data Sharing Repositories and Data Sharing Policies: A Contribution to the Data Sharing Initiative at the NIH
Monday May 6
1:30-2:30 Poster Session 2 (HCC, Level Two, Hall of Exhibits)
#30 All It Takes Is One: Single-Session Data Literacy Instruction
#134 Hosting a Seminar Series to Engage the Biomedical Research Community
#251 Assessing the Information Needs of Early Career Biomedical Researchers
Session: Global Data Sharing to Advance Science and Environmental Aspects of Global Health (HCC, Level Three, Room 311)
- One Academic Library’s Response to Data Management
- Introducing Researchers to Data Management: Pedagogy and Strategy
- Supporting the Local Research Data Environment via Cross-Campus collaboration and Leveraging of National Expertise
- DataShare: Facilitating Scientific Data Sharing
Session: How Data Collection and Ethics Intersect in Eliminating Health Disparities (HCC Level Three, Room 303)
- Changing the World with Data Collection, One Exam at a Time: Clinicians and Librarians Map the Way
- If Not Us, Then Who? Medical Librarians using Information Advocacy to Promote Health Equity
- Welcoming Users to Digital Libraries: Redesigning an Open Access Repository for Community Engaged Health Reseach
- Enhancing Library-Based Services for Clinical and Translational Researchers
Tuesday, May 7th:
Poster Session 3 (HCC, Level Two, Hall of Exhibits)
#11 Planning Educational Outreach with Future Researchers
#55 Regional Medical Library-Sponsored E-Science Activities: A Qualitative Survey and Lessons Learned
Session: Altmetrics and Revolutions: Web-Native Science and the Future of Scholarly Communication (HCC, Level Three, Room 312)
Session: Open Access in Action: Trends, Policies and Institutional Activities in Support of Open Information (HCC, Level Three, Room 309)
Session: Librarians as Researchers: Practicing What we Preach in Scholarly Communications: specifically two talks—Altmetrics: Determining the Full Impact of Scholarship, and Librarian Readiness for Research Partnerships (HCC, Level Three, Room 305)
Session: Veterinary Medical Libraries Section: From Bench to Bedside: Building Interprofessional Innovations (HCC, Level Three, Room 313)
Initiated and led by Charles Bailey, Publisher of Digital Scholarship, the Digital Curation interest group on Linked In currently has 956 members. Charles Bailey frequently posts links to resources related to data management, digital curation, and preservation. Bailey is very well known for his annual Research Data Curation Bibliography.
The description for the Digital Curation group notes:
“This group discusses digital curation, which the Digital Curation Centre defines as “maintaining, preserving and adding value to digital research data throughout its lifecycle.” The DCC’s digital curation lifecycle model includes these steps: conceptualise, create, access and use, appraise and select, dispose, ingest, preservation action, reappraise, store, access and reuse, and transform. The group does not deal with “content curation” (see “What is Content Curation?,” http://bit.ly/uxifrb).
The group is open to all Linked In members–if you’re on Linked In, search for Digital Curation under the “Groups” heading.
The Duraspace Hot Topics Webinar Series will be hosting two upcoming webinars about VIVO. VIVO is an open community, an information model, and an open source semantic web application that supports the advancement of scholarship by integrating and sharing information about scholars, their activities and outputs at a single institution.
The series will cover three webinars. The first one, Overview of VIVO, will be held on May 14 at 11 AM EDT. The second webinar, Case Studies: VIVO at Colorado, Brown, Duke, and Weill Cornell Medical College, will be held June 4 at 11 AM EDT. The third webinar, VIVO Technical Deep Dive, will focus on technology and participation and will be held on June 11 at 11 AM EDT.
For full details and links for registering for the webinars, see Duraspace announcement .
A case study about commonly observed errors when researchers prepare data for sharing and archiving , “Common Errors in Ecological Data Sharing“, by Karina E. Kervin, William K. Michener, and Robert B. Cook, has just been published in the Journal of eScience Librarianship. While the findings in this case study relate to ecological data, they are applicable to any field in which meticulous data management is critical to the discovery, access, usability of archived data sets.
I just came across a user-friendly online course booklet that teaches librarians the fundamentals of research data management: “RDM for Librarians” Created by Sarah Jones and Marieke Guy of the Digital Curation Centre and Miggie Pickton of the University of Northampton, this straightforward tutorial covers the fundamentals of research data management, potential RDM library services, and institutional RDM policies.
The course includes compelling examples to illustrate concepts in a lively way. Check out C. Titus Brown’s My Data Management Plan satire on page 9, which is followed by a useful table that compares Brown’s witty data management plan entries to model answers from NIH, ICPSR, and the University of Bristol.
Beyond the basics of RDM, the course also covers potential library roles, suggested solutions for barriers to data sharing, and illustrates initiation of library based RDM services by using the example of the University of Northampton.
David Dietrich, Advisory Technical Consultant in EMC’s Global Education Services Organization, focuses on issues related to Big Data and data analytics, and is a participant in MIT’s bigdata@csail initiative. He recently wrote an interesting piece on EMC’s InFocus blog titled The Dirty Little Secret of Big Data Projects. Dietrich includes in his piece The Data Analytics Lifecycle that EMC developed–which is a helpful model for understanding the steps involved in data analytics.
What Dietrich focuses on in this particular post is the particularly tedious and time consuming step of the Data Analytics Lifecycle: Data Prep. This step involves cleaning and conditioning data into a structure that can be analyzed. Dietrich notes that the Data Prep step can easily take up 80% of the time of a project. Because cleaning and prepping data is so labor intensive, one unfortunate consequence is that only 1-2% of a project’s data typically end up in an organization’s Enterprise Data Warehouse–the remaining 98% is often unused and inaccessible!
Check out his post for more details about the Data Analytics Lifecycle, step #2 Data Prep, and relevant tools.
The following internship opportunity was forwarded from Code4lib.org.
“Reporting to the Archivist, the Data Engineer Intern plays a key role in a pilot project exploring methods of connecting text, digital images, and taxonomic data sets. Intern will build elegant, reusable code to connect archival collections with biological specimen records, and will have the opportunity to invent creative ways of visualizing the resultant data sets. This project is funded under “Connecting Content: A Collaboration to Link Field Notes to Specimens and Published Literature,” a National Leadership Grant from the Institute of Museum and Library Services. This position is a 135 hour internship, with a stipend of $4000.00 and the option of receiving course credit. Hours must be completed between 9AM-6PM, Monday – Friday, during the summer of 2013.
ESSENTIAL DUTIES AND RESPONSIBILITIES:
Work with Archivist to plan 3-month software project
Build elegant and reusable code to interact with RESTful web services
Explore methods of visualizing taxonomic data sets on the web and in Google Earth
Present results to project partners and Academy staff
Work onsite at the California Academy of Sciences
Follow all Academy safety regulations
Other duties as assigned”
For further details, check out position on code4lib Job posting at http://jobs.code4lib.org/job/7439/
The ACRL Digital Curation Interest Group is sponsoring a webinar tomorrow, April 17th at 2 pm EST, featuring Michele Reilly and Anita Dryden from the University of Houston. They will discuss “Creation of an In-House DMP Tool at the University of Houston Libraries.
Registration is required to receive the link to the webinar and virtual space is limited to 100 attendees. For further information and to register: http://connect.ala.org/node/204100
Submitted by Jen Ferguson, Data Services Librarian, Northeastern University.
I’ve recently returned from RDAP13, the Research Data Access & Preservation Summit in Baltimore. I’m still processing a lot of what I heard at this fantastic meeting. For this blog post, I thought I’d call out some themes from the talks that particularly stood out for me. Disclaimer: these are recreated from memory and/or my sketchy notes, so if you were there and have a different take on things, please comment!
Theme 1: It’s gotta be easy for the researchers.
Mark Leggott of U PEI/DiscoveryGarden talked about the fact that researchers are squirreling data away in places like Dropbox. They’re developing a slick Islandora tool to pull data directly from Dropbox for deposit in repositories; this data ‘pull’ can happen on an as-needed or scheduled basis.
Cerys Willoughby spoke about LabTrove, the open electronic lab notebook (ELN) from their group at U Southampton. This is a project that targets a sweet spot in a universe dominated by very expensive proprietary ELNs. LabTrove is a bloglike ELN that automatically captures some metadata, such as date and user, and also allows users to custom-define some of their own metadata. Cerys had some observations about their users’ self-defined metadata. They’ve found that their ELN users tend to either love (rare) or hate (common) applying metadata to their notebooks; in fact, they found that over 50% of the ELNs in use have had exactly zero user-defined metadata applied. The group has only made one metadata field mandatory in LabTrove; despite this, they’ve found that many users have simply entered a space into that sole required field rather than supply any information!
Renata Curty presented on her group’s content analysis of data management plans from NSF awardees. There were many interesting findings in their study, but here’s my personal favorite: many awardees said they planned to archive their data in a not-yet-existent institutional repository. (Okay, so this isn’t quite about making it easy – though depositing to a nonexistent IR is pretty easy – I just loved this little tidbit and wanted to call it out again.)
Theme 2: It’s gotta be rewarding for the researchers.
This idea first surfaced in a lightning talk by Nic Weber, who talked about the carrot of ‘fortune and fame’ (such as it is exists in academic publishing) and the power of vanity metrics, calling out PLoS as an example that does a pretty good job of generating and displaying those metrics. He made the argument that data usage is, in and of itself, an assessable demonstration of data curation impact.
Heather Piwowar talked about altmetrics, largely focusing on ImpactStory. ImpactStory, an altmetric aggregator, uses eye-catching badges like ‘highly saved’ and ‘highly discussed’ to help reveal/surface use of datasets as well as other types of content. Badge information is pulled from a variety of sources, ranging from PLoS, PMC and Mendeley, to GitHub, Figshare and Slideshare,. Altmetrics like these afford a relatively simple means to give recognition and reward to those who participate, without that minor stumbling block of trying to change the whole scholarly publishing paradigm to do so.
Theme 3: It’s gotta be easy and rewarding for us too!
I think it’s safe to say that this liaison business can sometimes be a bit frustrating. We certainly have our champions amongst the faculty and researchers, but getting through to some other folks seems nearly impossible at times. That’s why I liked a couple of things I heard at RDAP that suggested that playing to our champions is not such a bad strategy.
Amy Nurnberger described how Columbia has leveraged relationships between publishers, faculty, and Columbia’s library – specifically their digital repository, Academic Commons. The library approached ‘data-friendly’ publishers like PLoS and ESA for lists of Columbia faculty that had published in their titles. They then focused outreach efforts on the faculty on the lists as they were known to be, well, open to open data! They also worked out an agreement with these publishers for Columbia to get the publication-related data deposits for their repository. This is a nice win/win because Columbia gets the data, and the publishers get the ability to offer the researcher’s local repository as an archiving option, with an end result that the datasets are more closely tied to the publication. This is such an elegant approach that it had me slapping my head that it hadn’t occurred to me!
Along related lines, Mark Parsons spoke about the Research Data Alliance, and touched briefly on the leadership model of positive deviance. This model suggests that rather than trying to change the ‘unreachable’ folks in a group, you look for and focus on the people who are already engaging in the kinds of behaviors you want to see, and then you encourage/nudge them to do more. Eventually those behaviors will ‘pollinate’ others in the group you’re trying to influence. I hadn’t heard of the positive deviance model before, but reading more about it is on my (lengthy!) list of to-dos that I’ve taken away from RDAP13.
Where can librarians get the opportunity to immerse in science–get an overview on science subjects, learn about cutting edge research, share new ideas with colleagues, and get the opportunity to network with science, health sciences, and engineering librarians from around the country (and sometimes beyond)?
At the annual New England Science Boot Camp for Librarians! The NE Science Boot Camp, now in its fifth year, will be held June 12-14th on the Amherst campus of the University of Massachusetts.
This year’s SBC offers some very educational and relevant sessions on :
- Analytical chemistry
- Public Health
- Capstone: a how to session for libs on conducting research interviews–guidelines, demo, and practice breakout session
See the 2013 Science Boot Camp Guide for more details and to register.
Last Wednesday’s (April 3rd) very well attended University of Massachusetts and New England Librarian e-Science Symposium, held at the Hoagland Pincus conference center in Shrewsbury, MA.– featured several thoughtful presentations on topics such as:
- planning research computing services
- the partnership between the University of California at San Diego’s Supercomputing Center and the UCSD Library
- group discussion about the Journal of eScience Librarianship
- e-Science activities in the New England region
- DataONE project
- research on factors that lead to successful library engagement in e-Science
- University of Virginia’s Scientific Data Consulting Group
- Value-based indicators for data reuse
The slides from these presentations can now be viewed on the 2013 e-Science symposium Overview conference page
Video of the symposium presentations is forthcoming and will be posted on the e-Science portal.
ASIS&T is publishing presentation slides from this past week’s (April 4th and 5th) RDAP 13 conference. They can be viewed on ASIS&T’s slideshare site
DRYAD researcher Heather Piwowar is well known for her research on data citation and her advocacy for open data sharing. In her blog Research Remix, Heather frequently contributes her ideas and research findings on factors related to data sharing, data reuse and the impact of sharing data on citation rates.
Recently Tim Vines, editor of Molecular Ecology and past member of the DRYAD consortium, interviewed Heather to find out more about her research background and how it fueled her interest in data archiving and open data. Check out Peggy Schaefer’s DRYAD post about this interview: An interview with Heather Piwowar: on data archiving, open notebook science, and discovering your impact flavor.