Thoughts on Data Management and Data Ethics
Earlier this year, Brian Westra and I gave a brief presentation on Data Management issues in an annual seminar that the Department of Psychology holds for its first-year graduate students. The seminar’s larger topic was data ethics. One question that came up was how data management and data ethics relate to one another.
This post has two parts:
1) A few points on how data management and ethics relate. It can be useful to think about this topic explicitly because discussions about it can help to guide research and data management decisions.
2) A list of resources on these topics for new graduate students. Some of the links relate specifically to Psychology, but they all apply in principle across disciplines.
Short version: Data ethics is built on data management. Both are more about one’s frame of mind than about any specific tools one chooses to use. Having said that, it’s important to give oneself exposure to ideas and tools around these topics, in order to push one’s own thinking forward. Some useful resources are listed below to help you get started.
Thoughts on Data Management and Ethics
- Data ethics is built on data management.Ethical questions and potential problems come up based on data management decisions made in the past, down to topics as seemingly trivial as where data are stored (on a passwordless thumb drive? Or in a place where disgruntled employees or Research Assistants can access or abscond with them?), to what data are even kept (it’s hard to re-use something, for good or ill, if you didn’t record it in the first place).
This doesn’t just apply to ethically problematic topics, though. Things that may look like bad ideas at first (such as the ability of RAs to remove data from the lab) may not be in certain situations, just as ideas that seem good at first may come to seem bad later. The larger point is that ethical questions about both legitimate and illegitimate uses of research data need to be considered and addressed as they come up, and that DM decisions can help one to predict which questions are more likely than others to come up during the data lifecycle.
- Talking about both data management and ethics is more than talking about tools.Because data ethics questions come up based on data management decisions, it is true that discussions about data ethics sometimes require at least some minimum level of technical understanding. Basic technical knowledge here can help to answer questions beyond whether certain data should be kept or not, such as which ways of storing and offering access to those data would be acceptable. This basic technical knowledge is important for many ethical discussions, because it can help to shape the conversation to more nuanced topics.
Rather than the take-home message here being that lacking some amount of technical understanding means that one shouldn’t engage in conversations about data ethics, I think that making education on these topics easily accessible (here at the UO, for instance, through our own DM workshops, workshops from the College of Arts and Sciences Scientific Programming office, and resources such as the Digital Scholarship Center) is important and necessary, as is taking advantage of them.
- Data Management (and, thus, data ethics) is about having a certain frame of mind, even at a superficial level.Data Management often has to do with thinking about decisions up-front rather than in a reactionary way. This frame of mind can also apply to talking about data ethics. Even if some ethical issues haven’t come up yet, having good DM in place can help one to more quickly understand and respond to new issues that do come up.
Resources for New Students (especially in Psychology)
Issues of data management are not going away; indeed, their relevance to individual researchers will likely increase — the White House, for example, recently issued new guidelines requiring Data Management Plans (and encouraging data sharing) for all federal grant-funded research. Below is a list of resources to prompt further thought and discussion among new grad students.
These are listed here with a focus on Psychology; having said that, many of them have relevance beyond the social sciences:
- A useful summary of tools that are available for graduate students to organize their work (including data), from Kieran Healy, a Sociologist at Duke University.
- An overview of the new “pre-registration” movement in Psychology: “Pre-registration” is when researchers register their hypotheses and methods before gathering any data. In addition to increasing transparency around research projects, this practice can increase how believable results seem, since it can decrease researchers’ incentives to go “fishing” for results in the data. This practice could also presumably be used to build a culture in which all aspects of a project, from methods to data, are shared.
- Especially relevant for social scientists, a nice summary of several cases that deal with data management and the de-identification of data:
- A summary of several cases in which de-identified data were able to be re-identified by other researchers, from the Electronic Privacy Information Center (EPIC)
- A more nuanced, conceptual reply to (and criticism of) focusing on cases such as those in the summary above from EPIC. A take-home message from these readings is that data can sometimes be re-identified in very creative ways not immediately apparent to researchers. Other sites, such as this page from the American Statistical Association, summarize techniques that can be used in order to share sensitive data. Of special note is that “sensitive data” could, if that information were re-identified, include not only medically-related records, but even answers to survey questions about morality or political affiliations.
- For students at the UO: Sanjay Srivastava, Professor of Psychology, often includes commentary on data analysis and transparency issues on his blog, The Hardest Science.