Message of the day
Good data are FAIR – Findable, Accessible, Interoperable, Re-usable
Things to consider
What makes data good?
- It has to be readable and documented well enough for others (and a future you) to understand.
- Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
- Tidy data are good data. Messy data are hard to work with.
- Data quality is a process, starting with planning, all the way through to curation of the data for deposit.
Example: This dataset is still around and usable more than 50 years after the data were collection and more than 40 years after it was last used in a publication.
Counterexample: This article: http://www.sciencedirect.com/science/article/pii/S1751157709000881 promises “Statistical scripts and the raw dataset are included as supplemental data and are also available at http://www.researchremix.org.”
There are a number of guides to tidy data, from this blog post about tabular data, to these more detailed instructions about preparing data for archiving and sharing, and Hadley Wickham’s writeup on tidy data.
Have questions, or want to learn more? The UO data librarians can assist you.
If you want to learn on your own, Project TIER teaches undergraduate students how to structure data for reproducible research: http://www.projecttier.org/tier-protocol/specifications/
UK Data has great instructions for how to document your data: http://www.data-archive.ac.uk/create-manage/document
If you want to go all in, look at the instructions for documenting data in ICPRS’s Guide to Social Science Data Preparation and Archiving
Example: Data can take many forms. This compilation of “Morale and Intelligence Reports” collected by the UK Government during and after the war is a great example of qualitative historical data: https://discover.ukdataservice.ac.uk/catalogue/?sn=7465
What is your favorite data set? How/why is it good for your project? Try out the FAIR Principles to describe and share examples of good data for your discipline. Tell us on Twitter (#loveyourdata) or in the comments section below!