Rescuing unloved data

Message of the day

“Data that is mobile, visible and well-loved stands a better chance of surviving” ~ Kurt Bollacker

Things to consider

Legacy, heritage and at-risk data share one common theme: barrier to access. Data that has been recorded by hand (field notes, lab notebooks, handwritten transcripts, measurements or ledgers) or on outdated technology or using proprietary formats are at risk. Born-digital files can be at risk too, since they can be susceptible to poor management, bit rot, or even direct attempts at reducing access.

Securing legacy data takes time, resources and expertise but is well worth the effort as old data can enable new research and the loss of data could impede future research. So how to approach reviving legacy or at-risk data?

How do you eat an elephant? One bite at a time.

  1. Recover and inventory the data
    • Format, type
    • Accompanying material–codebooks, notes, marginalia
  2. Organize the data
    • Depending on discipline/subject: date, variable, content/subject
  3. Assess the data
    • Are there any gaps or missing information
    • Triage–consider nature of data along with ease of recovery
  4. Describe the data
    • Assign metadata at the collection/file level
  5. Digitize/normalize the data:
    • Digitization is not preservation. Choose a file format that will retain its functionality (and accessibility!) over time: “Which file formats should I use?”
  6. Review
    • Confirm there are no gaps or indicate where gaps exist
  7. Deposit and disseminate
    • Make the data open and available for re-use




There are many opportunities to rescue at-risk or legacy data. Locally, as faculty retire, reach out to departments to assist in curating existing yet inaccessible data. Regionally and nationally, partner with other stakeholders to revitalize at-risk data. Think: Citizen Science.

Get involved with the #datarefuge project

Share a tweet about today’s message with #LYD17 , or use #WhyILYD17 and you’ll be entered in a raffle for a book from Facet.

The 2017 Love Your Data Week is February 13 – 17, 2017. Monday Tuesday Wednesday Thursday Friday
Adopted with permission from the international Love Your Data Week 2017 materials.
Posted in Data cleanup, Data rescue | Tagged , | Leave a comment

Finding the right data

Message of the day

Need to find the right data? Have a clear question and know how to locate quality data sources.

Things to consider

romanticlocationicon_nounprojectIn a 2004 Science Daily News article, the National Science Foundation used the phrase “here there be data” to highlight the exploratory nature of traversing the “untamed” scientific data landscape. The use of that phrase harkens to older maps of the world where unexplored territories or areas on maps bore the warning ‘here, there be [insert mythical/fantastical creatures]’ to alert explorers to the dangers of the unknown. While the research data landscape is (slightly) less foreboding, there’s still an adventurous quality to looking for research data.



1. Formulate a question

The data you find is only as good as the question you ask. Think of the age-old “who, what, where, when” criterion when putting together a question – specifying these elements helps to narrow the map of data available and can help direct where to look!

  • WHO (population)
  • WHAT (subject, discipline)
  • WHERE (location, place)
  • WHEN (longitudinal, snapshot)

This page from Michigan State University Libraries’ “How to find data & statistics” guide does a great job of further articulating these key elements to forming a question and putting together a data search strategy.

 2. Locate data source(s)

After you’ve identified the question, then you can begin the scavenger hunt that is locating relevant source(s) of research data. One way to find data is to think about what organization, government, industry, discipline, etc., might gather and/or disseminate data relevant to your question.

Below are some good suggestions. You might also want to check out the UO Libraries guide to locating data.

  • There are an increasing number of city or state-wide data portals – some examples: New York City, Hawaii, and Illinois – that provide access to regional data on everything from traffic patterns to restaurant inspection results.
  • Science data tend to be distributed among a vast array of repositories, usually by specific discipline. See this page for some recommended repositories, or go to an Open Access Data Repositories list.

Check out this post from Nathan Yau, data viz whiz and creator of FlowingData — his post includes some of the sources listed above, but also highlights tips like scraping data from websites and using APIs to access data.

3. Cite accordingly  

The ability to reuse data is only as good as its quality; the ability to find relevant data is only possible if it’s discoverable. As a producer of data, that means following many of the practices articulated in earlier posts. As a consumer of data, that means being a good citizen and citing your data sources.

In general, citing data follows the same template as any other citation — include pieces like author, title, year of publication, edition/version, persistent identifier (e.g., Digital Object Identifier, Uniform Resource Name). Check with your data source as well – they may provide guidance on how they want to be cited!

See DataONE and ICPSR pages on data citation for examples and more guidance.


BYODM — build your own (research) data map!map-and-compassAsk yourself:

  • What data sources are most relevant to my research?
  • Are there relevant data sets generated or held locally that I have access to?
  • What information do I need to retrace my steps back to these data (e.g., contact information, URLs, etc.)?

Share a tweet about today’s message with #LYD17, or use #WhyILYD17 and you’ll be entered in a raffle for a book from Facet.

The 2017 Love Your Data Week is February 13 – 17, 2017. Monday Tuesday Wednesday Thursday Friday
Adopted with permission from the international Love Your Data Week 2017 materials.
Image credits:
Unagar, Pravin. (n.d.) “Romantic Location.” The Noun Project.
Sáenz, D. (n.d.) “Map and compass.” The Noun Project.
Posted in Data centers & repositories, Data quality, Sharing / publishing | Tagged , | Leave a comment

Why I Love Data Raffle

Love Your Data 2017 Raffle

#WhyILYD17 Raffle!

This year’s Love Your Data week is Feb. 13 – 17, 2017, and Facet Publishing is donating titles from their research data management series, which will be raffled off during the week.

To enter, please share a tweet about why you (or your institution) are participating in Love Your Data Week 2017 using #WhyILYD17

Follow @facetpublishing

Chambers Cat 2.02.qxdlydw_2017_headerLove Your Data Week – 2017 : Monday | Tuesday | Wednesday | Thursday | Friday

Posted in News | Tagged , | Leave a comment

Data Carpentry Workshop

Are you looking for:

  • better ways to organize spreadsheet data
  • tools to speed up cleaning up tabular (spreadsheet) data
  • an alternative to commercial statistics software (R)
  • how to create data visualizations in R
  • using relational databases to manage data

If any of these are of interest to you, then a Data Carpentry workshop may be what you are looking for. Edward Davis (Geology) and the UO Libraries are hosting a a remote broadcast of a session taking place at University of California Museum of Paleontology this week.

What: Data Carpentry workshop
When: this week March 3 – 4, from 8 to 4 pm each day
Where: Knight Library (limited seating; registration required). This will be a remote broadcast of a session taking place at University of California Museum of Paleontology.  Assistants will be available to provide onsite support at the University of Oregon.


Data Carpentry workshops are for any researcher who has data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data. We will cover data organization in spreadsheets, data cleaning, SQL, and R for data analysis and visualization. Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

More about Data Carpentry

In many domains of research the rapid generation of large amounts of data is fundamentally changing how research is done. The deluge of data presents great opportunities, but also many challenges in managing, analyzing and sharing data.

Data Carpentry is designed to teach basic concepts, skills and tools for working more effectively with data.  The workshop is aimed at researchers at all career stages and is designed for learners with little to no prior knowledge of programming, shell scripting, or command line tools.

More information on the workshop:

Local contact: Prof. Edward Davis (

Posted in Analysis / statistics, Data cleanup, Data visualization, Workshops & Events | Leave a comment

Think Big-Transforming, Extending, Reusing Data

This is Love Your Data week, and each day we’ll be sharing a post about one or more fundamental data management practices that you can use. Part 5 of 5 (parts 1, 2, 3, 4)


While best practices for sharing your data are still evolving, there are some things to keep in mind when choosing to share your data:

  • When archiving your data choose an appropriate venue for your discipline. If you have any questions about choosing an appropriate data archive, contact your librarian.
  • Share ethically. Make certain that all sensitive information is redacted before submitting your data to an appropriate archive.
  • When sharing your data, include the metadata. Metadata, in part, documents your data. It tells others about your data: how it was created, who created it, and potentially, any stipulations for use of the data. For more information about metadata, consult UO Libraries page on Metadata & Data Documentation.
  • Before depositing your data be aware of any associated intellectual property rights. While copyright is not applicable to most research data in the U.S., licensing can apply. Want to learn more? Check out this guide from the University of Minnesota Libraries for a more thorough explanation of intellectual property, licenses, and research data.

Need more information? Make sure to consult the UO Libraries RDM page on Sharing Data.


What will future generations do with your data? How will it change the world? Think about ways in which your data can be used by scholars, change-makers, and everyday citizens to make a difference in the world.


How do you share you data? How do you make it accessible and intelligible for future users? What are some of your concerns about sharing data? How can we make sharing data easier for data producers? And of course, what would make reusing data easier for all levels of consumers out there?

Twitter: #LYD16 Instagram: #LYD16 Facebook:#LYD16


For additional information check out the resources board, the changing face of data on Pinterest, and consult the with the UO Libraries Research Data Management page on Sharing Data

Source:materials adapted from LYD website


Posted in News | Leave a comment