This is ‘Love Your Data‘ week, and each day we’ll be sharing a post about one or more fundamental data management practices that you can use. Part 1 of 5. Parts 1, 2, 3, 4, and 5
Follow the 3-2-1 Rule:
- Keep 3 copies of any important file (1 primary, 2 backup copies)
- Store files on at least 2 different media types (e.g., 1 copy on an internal hard drive and a second in your department or college’s server, or secure cloud storage)
- Keep at least 1 copy offsite (i.e., not at your home or in the campus lab — check with your department or college about offsite or secure cloud storage)
If possible, it is highly recommended that you set up an automated system to back up your files. This is true whether you work alone, or as part of a research team. For example, use Syncthing.
- Storing the only copy of your data on your laptop or flash drive
- Storing critical data on an unencrypted laptop or flash drive
- Saving copies of your files haphazardly across 3 or 4 places
- Sharing the password to your laptop or cloud storage account
Data snapshots or data locks are great for tracking your data from collection through analysis and write up. Librarians call this provenance, and it can be really important.
Errors are inevitable. Data snapshots can save you lots of time when you make a mistake in cleaning or coding your data. Taking periodic snapshots of your data, especially before the next phase begins (collection or processing or analysis) can keep you from losing crucial data and time if you need to make corrections. These snapshots then get archived somewhere safe (not where you store active files) just in case you need them. If something should go wrong, copy the files you need back to your active storage location, keeping the original snapshot in your archival location. For a 5-year longitudinal study, you might take snapshots every quarter. If you will be collecting all the data for your study in a 2-week period, you will want to take snapshots more often, probably every day. How much data can you afford to lose?
Oh, and (almost) always keep the raw data! The only time when you might not is it’s easier and less expensive to recreate the data than keep it around.
Instructions: Draw a quick workflow diagram of the data lifecycle for your project (check out our examples on Instagram and Pinterest). Think about when major data transformations happen in your workflow. Taking a snapshot of your data just before and after the transformation can save you from heartache and confusion if something goes wrong.
Where do you store your data? Why did you choose those platform(s), locations, or devices?
Source for this page: LYD website.