This is the second post in a series on future-proofing your work (part 1, part 3). Today’s topic is making a choice when clicking through the “Save As…” box in whatever program you use to do your work.
It pays to make sure that your files are saved in formats that you’ll be able to open in the future. This applies to data as well as manuscript drafts, especially if you often leave files untouched for months or years (e.g., while they’re being reviewed or are in press, but before anyone’s come along asking to see the data or for you to explain it).
Short version: “Open,” preferably plain-text file formats such as .csv, .txt, .R, etc., are better for long-term storage than “closed” formats such as .doc, .xls, .sav, etc. If in doubt, try to open a file in a plain-text editor such as Notepad or TextEdit — as a rule of thumb, if you can read the contents of the file in a program like that, you’re in good shape.
In general, digital files (from data to manuscript drafts) can be saved in two types of formats: open and closed.
- Open formats are those that
a) can be opened anywhere, anytime, and/or
b) have clear directions for how to build software to open them.A file saved in an open format could be saved in Excel but then just as easily be opened in Open Office Calc, SPSS, or even just a basic text editor such as Notepad or TextEdit. .txt, .R, .csv — if a file can be opened in a basic text editor (even if it’s hard to read when opened there), or has specifically been built to be openly understood by software developers (as with Open Office .odt, .ods, etc. files), you’re helping to future-proof your work.
- Closed or proprietary formats, on the other hand, require that you have a specific program to open them, or else reverse-engineer what that specific program is doing. SPSS .sav files, Photoshop .psd files, and, to some extent, Microsoft Office (.docx, .xlsx, etc.) files, among many others, are like this. How can you know if you’re using a proprietary file format? One rule of thumb is that if you try to open the file in a basic text editor and it looks like gibberish (not even recognizable characters), there’s a good chance that the file is in a closed format. This isn’t always the case, but it’s usually a good way to quickly check (1).
For an easy-to-reference table with file format recommendations, see our Data Management page on the topic.
It is important to note that, for example, even if R can read SPSS files, it doesn’t mean that SPSS files are “open.” They’re still closed, but have been reverse-engineered by the people who make R. SPSS, as a proprietary program using a proprietary file format, could change the format in its next version and break this reverse-engineering, or require that all users upgrade in order to open any files created in the new version.
So you’ve got a data file, and you’re willing to try out saving it in .csv format instead of .xlsx or .sav or whatever else your local proprietary vendor would suggest. Great! “But,” you say, “Will I lose any information when I re-save my data file? Will I lose labels? Will I lose analysis steps?” This, inquisitive reader, is an excellent question. In some cases, there is a trade-off between convenience now (using closed formats and some of the extra features they carry) vs. convenience later (finding that you can re-open a file that you created years ago with software that’s since upgraded versions or has stopped being developed).
In these types of cases, you could simply periodically save a copy of your files in an open format, and then keep on using the closed format that you’re more familiar with. Even doing something as simple as that could help you in the future. If you want to go a step further, however, read on in our next post, which will be published soon…