By Andrew Creamer, Scientific Data Management Specialist, Brown University Library
Six years ago Ubogu and Sayed (2008) conducted a survey of members of the Networked Digital Library of Theses and Dissertations (NDLTD) about their handling of ETDs and their associated data. They found that most of the institutions had no policy on the stewardship of raw data related to a thesis/dissertation, and such data was only stored if provided by the author as a supplementary file along with the full text. Only one institution at that time had a relationship between its electronic theses and dissertations (ETD) program and its research data management center/program. I would love if this study could be reproduced to see if these numbers had changed.
In 2011, Collie and Witt wrote a superb article supporting the positioning of libraries to collect ETD data. Indeed, they felt, at the time and may very well still feel so today, that “Dissertation datasets represent “low-hanging fruit” for universities who are developing institutional data collections” (p.166). While I agree 100% with their proposal that these collections are valuable and should be a strategy priority for libraries, especially as my colleagues and I embark on this journey at my own institution, I am, however, finding myself more than disappointed with their choice of metaphor, one that equates collecting student ETD data with no or little effort. On the contrary, I argue that setting up a sustainable service to archive ETD data can be a lot of work (but worth it)!
Indeed, the preparation and assessment phases can take a great deal of leg work. In the spring of this year Kate Thornhill and Lisa Palmer presented on a practical project at UMass Medical School that explored graduate students’ awareness of the ability to submit separate data files along with the PDF of their electronic theses and dissertations to the institutional repository. Kate, Lisa and their project team colleagues (disclaimer: Donna and I were on that team) wanted to have an idea of the types of data that graduate students were producing and the nature of the data that they might be submitting along with their ETDs, so Kate conducted a survey and some interviews to gather very useful information on the number, formats, and sizes of the data files the graduate students were producing, and their interest in submitting data along with their ETDs. If you have not seen this poster, you should, and you should also reproduce it at your institution.
Lately I have been sharing with colleagues the potential value and opportunities for the library to archive and build searchable collections of ETD-associated data sets produced on our campus, and I have received positive responses: “What a good idea! The data associated with my thesis is in a box in my PI’s basement” and “I can see how that would be useful–I don’t even know where my data is anymore.” However, even with such buy-in, we are quickly finding we are going to need a ladder to harvest this “low” fruit because there are many hurdles to climb and aspects for us to consider before actively seeking from the students the data files associated with their ETDs.
1. What’s Out There?
Kate, Lisa and colleagues wanted to know the nature of data students might be collecting and interested in sharing because they wanted to know if the institutional repository could handle the needs and demand were they ever to actively do outreach to build an ETD data collection. Data also means different things to different people. Which data would they want to submit- raw data? Analyzed data? What about data documentation and code? What file formats? What file sizes? What is the level of description? The list goes on.
2. Where Does the Buck Stop?
Many universities have an ETD submission process that involves the graduate school and library. Yet, as my colleague Jean Bauer points out, people often overlook the local aspect of the departmental coordinators and administrative assistants that are usually the first point of contact for graduate students to get information about ETD requirements, deadlines, and reminders for getting their ETD and graduation paperwork submitted. Thus, if the library wants to begin advertising to graduate students that they can submit their data files as well, then it needs to be sure that all the nodes along this chain of communication, not just the deans in the graduate school, are aware of it and have the same information. The issue here, of course, is that this takes a lot of leg work to do outreach with the subject librarians, departmental staff, as well as finding access opportunities to share this with graduate students at some point of their studies so that they are aware of the option to submit data. One issue we are encountering here is updating all the web pages from the stakeholders so that they have the same information.
3. Collections Policy
Since data can take so many forms, a one-size-fits-all policy for ETD data would be difficult to apply. Instead there should be an appraisal process in place to prioritize the collection of ETD data sets that do not necessarily have an established disciplinary repository that would be a better and more logical home for the data, one where researchers in that field would most likely look. In the case described above, a good policy would be to look at each submission on a case-by-case basis, and then create a separate record for the data set with metadata and a minted DOI that points out to where the data is, e.g., a 40 GB file of RNA sequence data in the Sequence Read Archive (SRA) at NCBI. As a result, if a user finds the record for the ETD, then he or she can find the associated record with experiment-level metadata for the data set, and then follow the DOI out to the SRA to access the data. A 40 GB file of RNA sequence data would be better placed in the SRA than an institutional repository anyways because biomedical researchers would logically go there to access such data, and it can better accommodate the large file sizes associated with raw sequencing data. Yet, this type of attention requires an investment of energy and effort on the side of the library to mediate these submissions and dedicate staff time to assist with metadata, mint DOIs, etc. This also begs the question about scalability- if the number of data sets submitted with ETDs were to suddenly increase as a result of good buy-in, outreach, and user satisfaction, could the library then keep up the pace of that investment and maintain the same quality of individual attention?
4. Embargoes and Encumbered Data
Managing ETDs comes with its own set of issues. Libraries usually offer some sort of embargo option for students’ ETDs, for example. The question here is if an ETD is placed under an embargo, would the data set also need to be embargoed? There are also concerns about copyright and intellectual property, sensitive data and identifiers, etc. that go along with any digital object placed into an institution’s repository. Thus, a question for the library to discuss with the graduate school is whether there should be a different process for ingesting ETD data sets or should they be treated the same as the ETDs, and what policies, indemnity clauses, and deposit agreements would be appropriate. In addition, since so many graduate students have data wrapped up in the work of their advisors, the usual questions about ownership apply. Here are two examples of such deposit licenses students from the University of Virginia agree to before depositing their ETDs and related data sets:
So as you can see, the height of this fruit is relative to where you’re standing. From an informal survey of some peer institutions, here are two scenarios of how ETD data curation is currently being handled (and I would love if the E-Science Community Blog readers could chime in and share more with me):
Scenario #1: There is no direct policy or concerted outreach to obtain ETD data sets from students. On the contrary, the ingest of data related to ETDs is either student-initiated through department program admins, the graduate school or repository; students either pass on supplementary files to the repository along with the ETD submission process through departments, graduate school and/or the repository, or they submit the data set separately to the repository unattached to the ETD, which may be linked back. There is no special treatment or separate record created for this supplementary data i.e., no metadata is created for the supplementary files; and there is no involvement of the graduate school in terms of policy, but the library or graduate school may mention the option to submit supplementary data files in the submission guidelines, but there are no special ingest permissions, policies, etc. created specifically to address ETD supplementary data sets (Ex. see number 9: http://library.gwu.edu/etds/steps.php).
Scenario #2: There is communication among department program admins, the graduate school, and library that submitting a data set along with an ETD is an option, and it is described in guidelines and communicated to graduate students; the data is either collected via supplementary files along with the submission of the ETD or deposited separately and then linked back to the ETD; submitted data sets receive metadata and their own digital object records. There is some sort of official license agreement or indemnity clause that the depositor (student) agrees to saying there is no IP, PII or other sensitive data restrictions (Ex. Harvard’s Dataverse deposit agreement: http://thedata.org/book/data-deposit-terms).
If you have some more ideas on this topic, please email me at email@example.com.
Update: Since I posted this I have heard back from colleagues looking at this same issue. Here is a really great idea from Sarah Shreeves to evaluate the ETD supplemental files in our repositories: http://hdl.handle.net/2142/35314. Shreeves, Sarah L. 2013. Supplemental Files in Electronic Theses and Dissertations: Implications for Policy and Practice [Poster Abstract]. Poster presented at the 8th International Digital Curation Conference, Amsterdam, Netherlands, January 14-17, 2013.
Acknowledgement: Thank you to Jean Bauer for the suggested title of this post.
Submitted by Simmons GSLIS student Jennifer Chaput, recipient of the 2014 Science Boot Camp Fellows scholarship.
As a new library science student, I was intrigued when I heard about Science Boot Camp, and am grateful that I was able to attend as a Student Fellow. A chance to connect with a group of science librarians and hear talks from researchers in different fields sounded like a great opportunity to see what the field is really like. This year’s Boot Camp was held on the campus of the University of Connecticut in Storrs.
I was nervous about how much of the content might apply to me, since I am not working in the field yet, but I never found that to be a problem. Everyone was friendly and welcoming, and by the end of the first day I already felt part of the group. The sense of camaraderie continued through the week. Meeting a wide variety of librarians was one of the best parts of Boot Camp for me, and I came away with many new friends and contacts in the field. I enjoyed being able to speak to librarians who work in a variety of settings. My career goal is to work in a hospital library, but after meeting many academic science librarians, I’m interested in learning more about that aspect of librarianship as well.
The bulk of Boot Camp is based around sessions in which scientists working in various fields present their research so we can learn about current trends in the field and how librarians can assist them. This year’s sessions were about Computer Science, Evolution, and Pharmacology. Within those fields, there’s an astounding variety of research being conducted. One of my favorite sessions was on tapeworms, of all things! During the Evolution session, Janine Caira from the University of Connecticut presented an engaging and dynamic talk about her work classifying and studying tapeworms in sharks and rays. I realized that it’s increasingly important for a researcher or scientist to be able to communicate their work well to a wide variety of audiences. While she was speaking on a more scientific level to us, I thought about how I would try to explain the content to a friend or family member. Working with researchers to understand their work and their needs for information and data managment is a key role that librarians can play in the process of scientific research.
The theme of communication carried into our capstone session on Citizen Science. From ways to involve the public in scientific observation such as fish counts or birdwatching, to a rap about climate change from Dr. Jonathan Garlick of Tufts University, a variety of ways to get people involved and interested in science were presented. Dr. Robert Stevenson from University of Massachusetts Boston summed up the entire Boot Camp well when he said that “librarians are silent partners to scientists.”
I came away from Boot Camp feeling energized and excited about my new career path, and am already looking forward to attending again next year. Although it’s almost overwhelming to see what options are out there and the different ways librarians function in the scientific fields, it’s also nice to see how broad the field is and to know that almost any direction is possible.
The video recordings of this year’s New England Science Boot Camp, held at the University of Connecticut are now available. The recorded sessions include Computer Science, Evolutionary Biology, Pharmaceutical Sciences and a special Capstone session on Communicating Science. Included in each recording are the concluding question and answer portions of the science and Capstone sessions. To view the recordings, see the 2014 Science Boot Camp YouTube Channel.
Submitted by guest contributor Brianna Marshall, Digital Curation Coordinator at University of Wisconsin at Madison.
In June, I started as the Digital Curation Coordinator at the University of Wisconsin-Madison Libraries. This is my first professional position, so some of the ideas below apply to new jobs of any kind.
1. Simplify your explanation of what you do. Data services folks have incredibly varied backgrounds, titles, and responsibilities. It can be really hard to explain what we do to anyone, even our colleagues. For instance, my job is newly created and the title is a mouthful, so very few people beyond my search committee understand what I am supposed to be doing. To translate, I tell people I’m a data management librarian who works with the institutional repository. This helps both my colleagues and researchers that I work with understand my role by answering the questions I already know they’ll ask.
2. Anticipate the challenge of understanding the culture of your institution. I don’t think of myself as particularly naïve, but I will say that I thought this would be simpler. In retrospect, I think I put too much pressure on myself to know it all right away. However, that’s just not possible – a lot comes down to getting to know past events: the history of relationships, projects, and group dynamics. It takes waiting and watching, asking questions and listening carefully. Sometimes it takes stumbling upon information. It was helpful when colleagues told me it was normal to still feel somewhat out of the loop as late as a year into the job – suddenly I felt much more relaxed about my confusion!
3. Take the time to meet people one on one. At first I felt strange about this – getting coffee with a colleague felt too fun to be work. In my former life as an hourly worker I had never done this on the clock! However, I am confident this will pay dividends. Getting work done is a matter of relationships. You rarely get the real story in a committee meeting; you get it by talking to someone one on one. Coming into this job, I was worried that people may think I was trying to overshadow the existing data-related work happening across campus. Getting to know people on an individual level allowed my intentions to shine through and helped us figure out how to collaborate. As a bonus, these relationships have helped me get acclimated much quicker – rather than still feeling like the new girl, I’m starting to feel like just another member of the team at UW.
4. Re-learning how to strategize. I always prided myself on being a long-term thinker: I like to plan, execute, and enjoy the fruits of my labors. With a new job, though, it’s tougher to see how I fit into the big picture on campus. How does my work with the institutional repository affect my work with data information literacy and how will that affect my ability to get a data curation pilot up and running? When I was in grad school, my ultimate goal was to get a job, so I tailored my activities to that objective and timeline. Now, it seems that my goals are moving targets with quickly shifting timelines. The scope of my job description is broad due to its very nature as a new and exploratory role. It has been helpful for me to have a supervisor that I can pepper with questions as needed. What is appropriate? Has anything like this been done here before? Do I have resources available? This helps me focus in on how the pieces fit together. It’s tempting to try to speed up to catch up to the impressive projects undertaken by other institutions (Purdue and University of Minnesota, I’m looking at you!) but I remind myself: one bite at a time!
5. Introduce yourself to the other data librarians/ archivists/ technologists/ coordinators out there. We are a diverse group. Some of us are scientists, some of us have computer science/IT backgrounds, and others still are English majors who found themselves working with data and are still a bit in awe of how far they’ve come from the career expectations they had as 12 year olds (raises hand). We’re all different, and as someone with a mixed LIS/IT background I am truly excited about all I can learn from my peers. I’ve met exceptionally cool people through conferences and online communities like Twitter. Saying hello and perhaps asking for advice is only hard in your head – in my experience, people in this field are very receptive and friendly, so by all means go for it.
I hope these ideas help anyone out there starting a new data-related job! What are your tips?
Below are some recent job opportunities for data librarians. If you would like to disseminate news about job opportunities at your institution on e-Science Community, please contact me at firstname.lastname@example.org.
University of Virginia: Senior Research Data Scientist (MLS not required, MS or PhD in Science/Engineering preferred)
Submitted by guest contributor Daina Bouquin, Data & Metadata Services Librarian, Weill Cornell Medical College of Cornell University, email@example.com
When do you use a database, and when do you use a spreadsheet? This simple question is key when designing a data management strategy, but it is important to note neither tool is always the better choice, rather you need to determine which tool is best suited for the task at hand. Spreadsheets get a bad name because they are so easy to misuse, but following some best practices for tabular data will keep you from making the most common mistakes. Meanwhile, databases are typically seen as being overly complicated and involving difficult to learn skills, but there are more than a few resources online to get you started with them (like Stanford’s Database course). This brief walkthrough will help you determine whether to go with a spreadsheet or a database on your next project.
First, it’s important to recognize that both spreadsheets and databases can be useful in manipulating data. Where these two tools differ is in how they store and manipulate data.
In spreadsheets, data values are stored in cells, with many cells making up an array of rows and columns. Cells can “refer” to each other and carry out processes on other cell values. Spreadsheets have taken over functions that, in the past, were carried out with paper and pencil in ledgers and worksheets (like financial record keeping) because they enable data to be recalculated much more quickly and efficiently than by hand. When one value in a column changes, totals and other formulas entered into cells are automatically recalculated. It’s important to note though that spreadsheets are not ideal for long-term data storage and only offer relatively simple query options. They also do not easily guard data integrity, and offer little protection from data corruption– so spreadsheets are great for tracking simple lists, but have realistic limitations. Many people think of MS Excel when they think of spreadsheets, but other platforms like GoogleDocs have spreadsheet applications as well.
With databases, data are usually stored in multiple tables. Each table is given a name and has columns and rows. Each row in the table is called record, and each record typically has a value for each column in a table. Database tables are typically used to store raw data, meaning that data in rows are not the result of some manipulation or function like in a spreadsheet. Databases also allow you to enforce relationships between records in different tables so that the data can then be retrieved through querying. Querying is like asking questions of the data to pull information into a formatted reports (e.g. an invoice). In this way databases easily manage large amounts of data and maintain data integrity better than spreadsheets typically do. Likewise, databases are better for long-term storage of records that may change and also have a much larger storage capacity than spreadsheets. Some database tools include MySQL, SQL Server, Oracle, MS Access, and REDCap.
Some questions to ask when deciding between the two:
- If you use a spreadsheet, would changes in one spreadsheet require you to make changes in others?
- Would the amount of data be manageable in a spreadsheet?
- Would you need several spreadsheets to contain related data?
- Would the data you are looking for be easy to find in a spreadsheet?
- Do multiple people need to access the data?
Answering yes to a few of these questions means you may want to consider going with a database over a spreadsheet for your project.
In summary, you may want to go with a database if:
- Multiple people will need to access the file
- The data is subject to change
- You want to store data long term
- You need a lot of storage space
- You need to generate multiple reports based on the same data– For example: a clinical researcher wants to see a group of patients average weight by month, another researcher only wants to see that measure for a certain subset of patients, and yet another researcher may be interested in instead seeing the median weight grouped by age. Rather than build three spreadsheets with different views, it would be easier and more efficient to make a database that would allow for queries to generate all three reports from one source.
Both spreadsheets and databases have their place, just try to avoid forcing a spreadsheet to do the work of a database. It’ll save you a headache.
A friend recently shared this article. It’s an interesting read that suggests that – education and experience being equal – women are less confident than their male peers, and that this lack of confidence in women negatively impacts their careers.
As I read the piece it triggered a memory – an old but vivid one.
Years ago I worked at a biotech company. My coworkers were almost all male. We came from similar backgrounds; most of us were recent college grads with degrees in biology or chemistry. We had plenty of chances to talk and bond over what was often monotonous work, so we got to know each other quite well. Sometimes we’d chat about science to pass the time.
One day I was talking science with a coworker while we worked on a large batch of samples. (He was the odd man out in our little group, since he’d majored in physics.) He made a pronouncement on some arcane biology topic that now escapes me. I knew what he said wasn’t right, and after a pause I jokingly called him on it.
He laughed and said: “OK, you got me. I’m really not sure – you’re probably right. I’m just the physics guy, remember?”
I said: “Well, you sure sounded like you were positive about it!”
Him: “Yeah, but that’s 90% of life, isn’t it? Sounding like you know what you’re talking about.”
Lighthearted as this conversation was, I can still remember being dumbstruck by his comment – the ol’ light bulb moment. I realized how self-assured my male coworkers sounded when they talked – about science, and just about everything else. I was much more tentative, sprinkling my speech with loopholes and deflections. It was a revelation – maybe they really DIDN’T know a lot more than I did; they just sounded like they did!
Per the 2013 Demographics Survey of ALA members, over 80% of librarians are female. IF librarians are mostly women, and IF women tend to be less confident than men, and IF a lack of self-assurance hurts women in their careers – what does that mean for libraries? And in particular, what does that mean in situations where libraries are pushing boundaries, reinventing themselves, and working to insert themselves and their librarians into the research enterprise in new ways?
There are, of course, many factors that contribute to the success of new initiatives. Much can rise and fall on environmental factors, like the support of library administration, institutional aspirations, and budgetary pressures. Subject matter background certainly helps librarians work closely with their departments, though I’ve argued before that I don’t see discipline knowledge as crucial. After reading this article, I’m wondering if part of what that subject matter expertise gives librarians is confidence. An extra dose of confidence is helpful for any of us, and may be even more welcome in situations where libraries and librarians are forging new paths.
What do you think? Does this article resonate with you, or not? Do you see any connections with our library work? Please comment here and/or on Twitter – #NERescience.
Simmons GSLIS is offering a Scientific Research Data Management class, 532G-01, this upcoming fall semester at Simmons’ Boston and Simmons West (Mt Holyoke) campuses simultaneously via videocasting on Saturday mornings from 9-12 pm. The class will be taught by a team of librarians that includes Dr. Elaine Martin, Rebecca Reznik-Zellen, and Donna Kafel from UMass Medical School, Andrew Creamer from Brown University, and Regina Raboin from Tufts University. The instructors will alternate between the two campuses; one week teaching at the Simmons’ Boston campus and videocasting to the students at Mt. Holyoke, and the alternate week vice versa. The class is open to enrolled Simmons students as well as interested librarians. (Registration for non-Simmons students begins in August). The first class begins on Sept. 6th.
This course, LIS 532G, Scientific Research Data Management, uses the case study method to prepare students from all academic backgrounds for roles in scientific research data management. It explores the current and emerging roles for information professionals in managing large or small volumes of research data sets. The course provides students with the skill set relevant to that of a data librarian whose job involves helping researchers manage and curate research data sets. The course examines the data practices of researchers in scientific fields such as biomedicine and engineering as examples of how researchers produce data and how they use these data for purposes of inquiry. Students learn about the purposes and tools of research data production and data reuse, data lifecycles and data reference interviews, data management practices, and the strategies of offering data consultancy services to researchers. Current issues regarding citing datasets, Open Access policies, and embedding the librarian as a member of a research team will also be addressed. The course will feature guest lectures by data scientists, data librarians and data archivists. Assignments include a series of readings, case study assignments, data reference interviews with researchers, and the development of data reference interview tools and data management plans for real research projects.
Full tuition for this 3 credit class is $3486, plus a $50 activity fee. Auditing the class (no credit or grade earned) is a less expensive option and is half the current tuition ($1743 for non-Simmons grads, $400 for any Simmons grads). For further details, see Simmons Forms and Policies.
If you’re interested in the class, please contact the Admissions Office at Simmons at firstname.lastname@example.org and discuss with the Admissions Office your preference for enrolling for credit or auditing. The office will send you an application. An official copy of your master’s degree transcript is a required component of the application.
New England librarians (or those outside of NE who don’t mind a long drive!) who are interested in learning more about e-Science librarianship and the management of scientific research data may want to consider enrolling in or auditing this class.
This summer ACRL is sponsoring an e-learning online course “What You Need to Know about Writing Data Management Plans” from July 14- August 1, 2014. The course teaches participants the elements of a comprehensive data management plan and is taught by Dee Ann Allison and Kiyomi Deards of the University of Nebraska-Lincoln. See ACRL announcement for more details.
The National Digital Stewardship Residency (NDSR) model was “designed to develop the next generation of stewards to collect, manage, preserve, and make accessible our digital assets.” NDSR was developed by the Library of Congress and was initially piloted in Washington DC. Beginning in September, the NDSR will be piloted in Boston at Harvard, MIT, Northeastern, Tufts, and WGBH. See NDSR announcement for further details!
Submitted by Donna Kafel, Project Coordinator for the NE Librarians’ e-Science Program, University of Massachusetts Medical School
The comment “Well done!!” is music to my ears. Last Friday I had the pleasure of hearing this music from many of the Science Boot Campers at the conclusion of the sixth annual New England Librarians’ Science Boot Camp hosted by Carolyn Mills and her colleagues at the University of Connecticut. After months and months of careful planning, it was very rewarding to hear Boot Campers’ positive feedback and realize that they value the unique science-immersive learning opportunity that the New England Science Boot Camp Planning Group strives to provide each year.
My colleagues in the New England Science Boot Camp Planning Group and I meet regularly throughout the year to plan the next Science Boot Camp. In the course of our work planning boot camps, group members have developed specific expertise and project management skills. One of the most fulfilling aspects of participating in the group is seeing a new Science Boot Camp come to fruition after our initial brainstorming sessions and multiple planning meetings, e-mail discussions, and the massive team efforts led by the librarian who is hosting boot camp at her campus. Each year the group works out the details to plan this 2 ½ day event: reviewing attendees’ evaluations of past boot camps, planning topics for future camps, selecting and inviting faculty speakers, working with the next campus host to plan budget, facilities and logistics, setting up a new Science Boot Camp LibGuide and registration, and making sure we broadly disseminate boot camp announcements to librarians and library students interested in science librarianship— just to name a few.
In planning each boot camp, the Planning Group sticks to the original New England Science Boot Camp mission: to provide science, health sciences, and engineering librarians and interested library students a science-immersive and affordable continuing education event with opportunities to network and share ideas in a fun, laid-back setting. Keeping the event affordable has been made possible through the sponsorships of these organizations: the National Network of Libraries of Medicine New England Region, the Boston Library Consortium, the University of Massachusetts Amherst, University of Massachusetts Boston, University of Massachusetts Dartmouth, University of Massachusetts Medical School, College of the Holy Cross, Tufts University, University of Connecticut, and Worcester Polytechnic Institute.
It’s inspiring to see the Librarians’ Boot Camp model being adapted around the country and in Canada. (For details about the other Science Boot Camps, please see Margaret Henderson’s April 22nd e-Science Community post “Science Boot Camps for Librarians 2014”). Each of our groups can learn a lot from each other, our group members, our faculty presenters, and our current and future boot campers.
In closing, I’d like to acknowledge the following members of the New England Science Boot Camp Planning Group:
Mary Adams and Elizabeth Winiarz of the University of Massachusetts Dartmouth; Paulina Borrego, Naka Ishii, and Maxine Schmidt of the University of Massachusetts Amherst; Tina Mullins of the University of Massachusetts Boston; Andrew Creamer (now at Brown University), Sally Gore, Elaine Martin, and myself from the University of Massachusetts Medical School; Bijan Esfahani of Worcester Polytechnic Institute, Regina Raboin of Tufts University, Barbara Merolli of the College of the Holy Cross, and Carolyn Mills of the University of Connecticut…..
….And give a big shout out to our gracious colleages at the University of Connecticut who hosted this year’s Science Boot Camp. Many thanks to Carolyn Mills, Sharon Giovenale, Valori Banfi, and the rest of the UCONN Science Boot Camp 2014 hosting team!
…And be sure to check out Sally Gore’s post “Hello Mudder, Hello Fadduh” and see her notes and sketches from boot camp….
…And stay tuned—I will be announcing when the Science Boot Camp videorecordings are available!
Submitted by: Jake Carlson, Associate Professor of Library Science/Data Services Specialist, Purdue University, email@example.com, @jrcarlso
Recently I had the pleasure of co-teaching a group of graduate students in a semester long data information literacy program. Amongst their many interests was learning how to organize their data files and folders in a logical fashion so that they can easily find what they need, when they need it. Locating a specific component of their data often devolves into a needle in a haystack search because they had named their files based on whatever thoughts were going through their head at the moment. This problem is compounded when they had to find files from or share files with their advisor, peers or collaborators.
We spent a class session discussing naming conventions for files and folders as a means to alleviate this situation. Naming conventions are means of communicating descriptive and useful information through the name given to a particular file. These names are generated through the consistent application of articulated rules that have been vetted and agreed upon by participating individuals. Well-chosen naming conventions make it easier to not only identify the content of a file at a glance, but also to understand how any given file relates to other files in the collection.
Generating an effective naming convention is an investment of time and effort. Naturally every naming convention will be unique to the environment in which it was created, but we covered some common considerations for getting started. They are:
- Identify the commonalities and important distinctions between the data files. This may include things like author, date, type of experiment, procedure, etc. Naming conventions are usually comprised of multiple elements. Ideally, these elements should be meaningful to the intended audience and significant enough to include as a part of the file or folder name. One way to approach this exercise would be to consider the stages of the data lifecycle and what happens in each stage.
- Find the right number of elements and characters. Including too many elements in a naming convention weighs it down and reduces its usefulness; too few elements create ambiguity. Four to five elements are usually sufficient. Similarly, too many characters can cause problems in transferring files. Consider using meaningful abbreviations where possible and err on the side of brevity.
- Define the elements and acceptable entries. Be sure that these decisions are documented and accessible. A naming convention will break down if not followed consistently and so a reference document will need to be made available to all. You may want to include a “keyword” element that could accommodate a free text description to further convey the content of a file to a user to allow for some flexibility in the naming convention.
- Decide upon the order of the elements. The order of the elements in the naming convention will determine how they are listed and how they are grouped together. Consider what is important to your audience. Do they want files organized primarily by chronology, by author, or some other means? Start out by listing the general elements first and then move towards the more specific ones.
- Versioning. If a versioning number is included be sure to define what constitutes a new version of the file and how lesser revisions will be accommodated in the documentation. Avoid using words like “final”, “update”, or “new” in the file name as they loose meaning over time.
Using these and other guidelines, we had our students develop their own naming conventions to apply to the data they were working on. This assignment was well received by students as it was something they could apply right away. Several students reported sharing what they learned with their peers and making an effort to develop a naming convention for their lab.
References and More Information on Naming Conventions:
North Carolina Dept of Cultural Resources (2008) “Best Practices for File-Naming” http://www.ncdcr.gov/Portals/26/PDF/guidelines/filenaming.pdf
Santaguida, V. (2010) “Folder and File Naming Convention – 10 Rules for Best Practice” http://www.exadox.com/en/articles/file-naming-convention-ten-rules-best-practice
Smith, E. (2011) “Folder Hierarchy Best Practices for Digital Asset Management” http://www.damlearningcenter.com/resources/articles/best-practices-for-folder-organization/
Tomorrow is the opening day for the 6th annual New England Science Boot Camp which is being hosted on the beautiful University of Connecticut campus in Storrs, CT. All the Science and Capstone sessions will be videorecorded and posted on the New England Area Librarians Science Boot Camp YouTube page within a few weeks.
Be sure to follow SBC happenings on Twitter at #sciboot14!
Submitted by guest contributor, Katie Houk, Research & Instruction Librarian at Tufts University Hirsh Health Sciences Library. firstname.lastname@example.org
If you have not read my previous blog, “Ask and You Shall Receive” I recommend you go give it a brief read before diving into this follow-up post.
I am recently returned from Savannah, GA where I was honored to be invited to give a 2 hour interactive presentation based on the first module of the New England Collaborative Data Management Curriculum (NECDMC). Being the new and experimental presenter for the conference committee, I was placed in the very last time slot for the entire conference. As we know from our own conference experiences, this usually doesn’t bode well for attendance; unless you happen to be a celebrity or a hot topic. Fortunately, my champion vocally advocated for attending my session throughout the entire conference and there ended up being about 16 attendees. Since it was advertised as a presentation and not necessarily a workshop, many were surprised that I made them do work, talk to each other, and speak up throughout the session. Thankfully, the general reaction from those who stayed for the majority of the presentation was highly positive and I am enormously thankful for the opportunity.
My overall impressions of the conference and group of people were also very positive. Compared to the librarian conferences I typically attend, it was an intimate experience – I don’t think the number of participants even filled the entire hotel it was hosted in. The intimacy meant that I felt more comfortable attending the business meetings, planning committees and other activities of the society because I saw the same people over and over. However, this is how the phrase “be careful what you wish for” suddenly popped into my head at the end of the program planning committee meeting – after I had committed to joining the society and being a co-convener of a day’s worth of programming – all before I’d even given my presentation! While I am very excited to be able to take on this leadership role, it was kind of surprising how quickly it happened – and that it isn’t even in an organization within my expertise.
Some thoughts I’m mulling over since the conference:
- People are naturally wary of the unknown, but scientists tend to be curious and experimental in nature, so I think they reacted more favorably to my workshop-style presentation than I expected.
- Seize opportunities boldly and enthusiastically. Others seem to respond more openly and helpfully to those who show enthusiasm and passion rather than act too reserved and shy.
- Attending, volunteering and being involved in science or health societies instead of just library associations may be a better way to be seen and heard, as well as become actively involved in promoting the worth of libraries on a national level.
- Scientists seem to know more about what their favorite product companies can do for them than their own institutions – especially librarians (they still have little idea what we do aside from deliver them articles).
- Small conferences seem to be much harder to fund and organize.
- Never be afraid to ask, but be prepared/careful for what you wish for.
Forwarding the following job announcement from the National Library of Medicine:
The National Library of Medicine (NLM), located on the National Institutes of Health (NIH) campus, in Bethesda, Maryland is recruiting recent library science graduates to fill entry level librarian positions. The positions offer a unique opportunity to work at the world’s largest biomedical library, with a mission of acquiring, organizing, and disseminating the biomedical knowledge for the benefit of the public’s health.
Positions are available in:
Web Site Development and Social Media
- Support site development, or new responsive web design for MedlinePlus
- Contribute to social media initiatives of NLM
- Support development and maintenance of NLM web sites by assisting with content management, usability, accessibility, information architecture, plain language, navigation and mobile access
- Acquire materials for the NLM collection and support the licensing of electronic resources
- Create and maintain serial records which serve as the underlying data for various systems throughout NLM; provide quality assurance of NLM serial records in local and national databases to ensure accurate journal citations in databases such as PubMed and PMC (PubMed Central)
Preservation; Digital Preservation
- Provide proper management, preservation and care of historical and non-historical collections, including monographs, serials, archives, manuscripts, oral histories, prints, photographs, posters, ephemera, motion pictures, video recordings, sound recordings, and other materials
- Participate in digital technology, digital imaging and preservation of analog and igital formats
- Organize consumer health information about diseases, conditions, and wellness, in both English and Spanish through MedlinePlus, the NLM consumer health web site
Data and Literature Management
- Design qualitative and quantitative assessments of tools and processes used in the indexing of biomedical literature
- Provide technical and research support for automated (machine-assisted) indexing initiatives involving biomedical literature
- Assist with data content review and editing of bibliographic citations and Web pages, including HTML or XML tagging and metadata application, to ensure data quality and consistency
- Test and evaluate NLM search systems, including the content in the systems and the interfaces used to access the systems
- Participation with customer service, training and outreach services for NLM systems, such as PubMed
Health Services Research, Public Health and Health Information Technology
- Engage with the public health and health services research communities in order to create and manage health information resources that serve their needs
- Support development of knowledge and information resources to promote interoperable exchange of data and information using standardized vocabularies and codesets, standardized survey tools and assessment instruments, and common data elements and measures
Data Science and Big Data
- Assist with initiatives to enhance access to biomedical data sets resulting from publicly funded research
- Analyze and develop guidance related to emerging policies that promote data sharing and open science
- Participate in projects to engage science communities of practice in standards efforts, including common data elements initiatives
Pay: GS-9 level with a pay rate of $52,146
Benefits: health insurance, and other benefits
Eligibility: Eligibility: Must have a library degree from an accredited school; must have a cumulative GPA of 3.0 or higher; must have graduated on or after 12/27/10 and be a citizen of the United States
Apply for NLM positions through the NIH Pathways for Recent Graduates (Librarian) Program of USAJobs: https://www.usajobs.gov/GetJob/ViewDetails/371420100 from June 2 – June 6, 2014
We encourage the submission of a cover letter identifying the area(s) you are most interested in working in, so that we can determine the area best suited in our organization.
NLM and NIH are dedicated to building a workforce that reflects diversity. NLM hires, promotes, trains, and provides career development based on merit, without regard to race, color, religion, national origin, sex (including gender identity), parental status, marital status, sexual orientation, age, disability, genetic information, or political affiliation.
In addition to an interesting, challenging work environment, NLM has a great location on the campus of the National Institutes of Health in Bethesda, Maryland. It is a short Metro ride from Washington D.C. and a short walk from Bethesda’s thriving restaurant and retail district.
For questions regarding these positions, please contact Kathel Dunn, Associate Fellowship Coordinator, National Library of Medicine, Kathel.email@example.com, ph 301.435.4083
Posted at the request of Susan Cole, Assistant Director for Scholarly Resources & Services, Science Librarian, Colby College.
Social Sciences Data Librarian
Colby College Libraries invites applications for a Social Sciences Data Librarian, a new position
in the Scholarly Resources & Services (SRS) group. The Colby Libraries seeks a candidate with
knowledge and enthusiasm to raise campus awareness of data literacy (data curation,
management, and preservation) with the potential to build library services for faculty and student
research. The data librarian may assist faculty with development of data management plans for
grant applications, assist with general data stewardship, as well as serve as a resource to
library colleagues for data and statistical support. The librarian will serve as liaison primarily to
departments in the social sciences or interdisciplinary subject areas, providing information
literacy and research instruction, individual consultations, and collection development.
● Graduate degree in library or information science from an accredited institution or
equivalent is preferred; alternate education and experience may be considered
● Undergraduate or advanced degree in the social sciences or sciences
● Knowledge of data management, curation, and preservation principles and practices
● Experience teaching information literacy and/or data literacy in an academic library
● Experience with statistical software as well as data from governmental and private
● Familiarity with geospatial analysis
● Excellent analytical, oral and written communication and presentation skills
● Commitment to service in a liberal arts setting
● Commitment to professional development
● Flexibility, creativity, energy, and ability to work in a changing environment, and to work
collaboratively as a member of a goal-oriented team
Position open July 1, 2014.
Applicants should address their materials to the chair of the Search Committee, Lisa McDaniels,
and send the following electronically in PDF format to Stephanie Frost (firstname.lastname@example.org).
● Cover letter
● Curriculum vitae
● Statement of teaching philosophy
● Graduate transcripts
● Three letters of recommendation
Founded in 1813, Colby is the 12th-oldest private liberal arts college in the country. Highly
selective, the college serves 1800 students. The 714-acre Mayflower Hill campus located in
central Maine is near inland lakes, an hour from the coast, and three hours from Boston.
Waterville and surrounding areas offer a reasonable cost of living in a beautiful setting. The
Colby College Libraries are central to scholarship and a key part of the Colby academic
program. There are three libraries with a professional staff of 13 librarians. A significant staff
reorganization in 2012 resulted in the Libraries being poised for transformational change in the
provision of services, instruction, and collections. The mission of the Scholarly Resources &
Services group of seven librarians is to support faculty and student research in an innovative
environment. Colby librarians are faculty without rank, eligible for sabbaticals and are expected
to contribute to creative, scholarly, and professional activities, and to participate in library-wide
and campus-wide service. For more information about the Libraries, visit www.colby.edu/library
Colby is an Equal Opportunity/Affirmative Action employer, committed to excellence through
diversity, and strongly encourages applications and nominations of persons of color, women,
and members of other underrepresented groups. For more information about the College,
please visit the Colby Web site: www.colby.edu
Posted on behalf of Chris Eaker, Vice Chair of the DataONE User Group
Registration is now open for the 2014 DataONE Users Group Meeting: http://www.dataone.org/dataone-users-group
In 2009 DataONE was established following a successful application to the “Sustainable Digital Data Preservation and Access Network Partners (DataNet)” Solicitation from NSF. The goal of DataONE was to “enable new science and knowledge creation through universal access to data about life on Earth and the environment that sustains it”. Through DataONE, participants have designed, developed and deployed a robust cyberinfrastructure (CI) with innovative services, and directly engaged and educated a broad stakeholder community. Five years later we have reached the end of that award and are excited to communicate our achievements, technologies and plans for Phase II that will take us through to 2019.
Join the DataONE group in Frisco, CO July 6-7th to learn more about its planned activities, provide feedback on development and network with other DataONE Users. There will be a number of break-out sessions (including one focussed on the DataONE Member Node network), community-led round table discussions and a poster reception for the community to highlight their projects of relevance to the DataONE community. On Monday July 7th there will be a half-day session on the DMPTool, version 2.
The DataONE Users Group meeting is conveniently co-located with the Summer ESIP meeting so head out a few days early to enjoy Colorado and learn more about DataONE.
There have been several recent job postings related to e-Science librarianship. Here is a list of the ones I’ve come across. If you would like to disseminate news about job opportunities at your institution on e-Science Community, please contact me at email@example.com.
Boston College, Chestnut Hill, MA: Digital Scholarship Librarian for the Sciences and Social Sciences. Full job description available at http://www.bc.edu/content/bc/libraries/about/jobs/staff.html
Hampshire College, Amherst, MA: Interdisciplinary Science Librarianhttps://jobs.hampshire.edu/index.cgi?&JA_m=JASDET&JA_s=344
Columbia University, NY: Data Services & Emerging Technologies Librarian. Full job description available at http://www.arl.org/leadership-recruitment/job-listings/record/a0Id000000EV1plEAD#.U3PRP3ZLqec
Case Western University, Cleveland, OH: 2 open positions: Digital Research Services Librarian for the Sciences, Digital Learning & Scholarship Librarian. The full job description and application information are available at http://www.case.edu/finadmin/humres/employment/career.html
Carnegie Mellon, Pittsburgh, PA: Research Data Services Librarian. Full job description available at https://cmu.taleo.net/careersection/2/jobdetail.ftl?job=100763&src=JB-10246
Virginia Tech, Blacksburg, VA: Research Data Consultant. Full job description available at https://listings.jobs.vt.edu/postings/48197
Argonne National Laboratory, Argonne, IL : Science Librarian 1. Full job description available at http://www.aplitrak.com/?adid=bXBzdWxsaXZhbi40NDE2Mi4xMzUyQGFubC5hcGxpdHJhay5jb20
Indiana State University, Terre Haute, IN: Data Curation Librarian. Job description available at http://lib.indstate.edu/about/jobs/DataCurationLibrarian.pdf
University of Florida, Gainesville, FL: Agricultural Sciences and Digital Initiatives Librarian. Full job description available at http://www.uflib.ufl.edu/pers/documents/AgriSciLibrarianSearchComm.pdf
University of California, San Diego, CA: Director of Metadata Services. Full job description available at http://academicaffairs.ucsd.edu/aps/adeo/recruitment/jobDetails.asp?PositionNumber=10-739
If you would like to disseminate news about job opportunities at your institution, let e-Science Community know!
As you might have noticed, the e-Science Portal has been going through a period of transition. We’ve welcomed many new content editors in recent months, and now we’re starting to tackle the design and usability of the portal.
For this, dear readers, we’re asking for your help.
We’d like to invite you to participate in online usability studies of the e-Science portal. It’s a simple, low-intensity way of contributing to the project if you don’t have a lot of time to spare. Not in New England? Still in library school? Not a regular user of the portal? None of that matters! All we’re asking for is about 30 minutes of your time to do some usability testing from the comfort of your own, er, computer.
If you’re interested in participating in the usability studies, we ask that you fill out a (super-short, really!) application by Friday, May 23rd.
Thanks for your consideration, and for your support of the portal.
The upcoming MLA annual meeting is a great time to learn more about escience and data. There are some excellent panels and posters, and some group meetings that will allow you to connect with lots of people with similar interests in helping researchers. Be sure to introduce yourself and ask questions. If you need more hints about attending MLA, see the compilation of ideas at the end of this #medlibs chat blog post. And don’t hesitate to introduce yourself to me, Margaret Henderson, if you see me there. I love to talk science and data (and embroidery, if you want a break from the other stuff.)
SIG business meetings (Special Interest Groups)
- Informationist – Sunday, May 18, 7-8:55 am
- Molecular Biology and Genomics – Tuesday, May 20, 7-8:55 am
- Institutional Animal Care and Use – Tuesday, May 20, 11:30 am -12:25pm
Section Business meetings
- Medical Informatics – Tuesday, May 20, 4:30-5:55 pm
- Information Building Blocks: Open Data Initiatives and Trends – Sunday, May 18, 4:30-5:55 pm
- Evolution of the Librarian: New and Changing Roles – Monday, May 19, 10:30-11:55 am- 2 talks on data, final speaker covers working with research teams.
- Librarian’s Role in the Translational Science Research Team – Monday, May 19, 2-3:25 pm
- Top Technology Trends VII – Tuesday, May 20, 6-7:30 pm
- Plenary Session4: MLA ‘14 Panel – Professional Identity Reshaped
There are lots of posters on data and science when you search the keywords. In many cases you can view the posters online right now, so you can be ready to ask questions during the poster session. Here is a sample:
Poster Session 1 Time: Sunday, May 18, 3:30 PM – 4:25 PM
(5) A Guide for Open Researcher and Contributor ID (ORCID). by Merle Rosenzweig, Caitlin Kelley, and Mari Monosoff-Richards
(19) An Assessment of Doctoral Biomedical Student Research Data Management Needs. by Kate Thornhill and Lisa Palmer.
Poster Session 2 Time: Monday, May 19, 3:30 PM – 4:25 PM
(82) Developing a Model, Library-Based Research Data Management and Curation Service to Help Scientists Archive and Share Research Data. by Richard Jizba and Rose Fredrick
(116) Improving Data Management in Academic Research: Assessment Results for a Pilot Lab. by Heather Coates
Poster Session 3 Time: Tuesday, May 20, 1:00 PM – 1:55 PM
(152) New Measures of Success: Altmetrics and the Changing Face of Scholarly Impact. by Kimberley R. Barker
(171) Putting the I in Team: Informationists on the Inside. Linda Hasman, Scott McIntosh