Message of the day:
Data quality is the degree to which data meets the purposes and requirements of its use. It may refer to completeness, accuracy, credibility, timeliness, accessibility, consistency, or other factors.
Things to consider
Data quality reflects on you as a researcher, but it can also have an impact beyond your individual project. By one estimate, Bad Data Costs the U.S. $3 Trillion Per Year: “The reason bad data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive.”
- Data quality is the responsibility of both data providers and data curators:
- Data providers ensure quality of their datasets, by research design choices, and how they choose to review, document, manage, and share datasets.
- Data curators work with the research community to address consistency, coverage, and metadata.
- How does your discipline define and address data quality?
- What tools and methods can you use in support of data quality?
- How can we distinguish between good and bad data?
“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.”
― Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values
So how does your research discipline address data quality? Here are some examples that might serve as food for thought:
- Responsible conduct in data management guidelines from the Office of Research Integrity discuss integrity and quality assurance.
- Social science data preparation and archiving guide from ICPSR on quantitative and qualitative data.
- Veracity as a data quality issue in Big Data
- Digital humanities data curation practices and their impact on data quality.
- Cell culture research and applying best practices to ensure scientific reproducibility.
- Data quality assessment (provides a table of various quality dimensions and their definitions) from Communications of the ACM, 45(4), 211.
- How Do We Define Clinical Trial Data Quality if No Guidelines Exist?
- CDISC (Clinical Data Interchange Standards Consortium) Standards
Some examples of what NOT to do:
Getting started – activities
- Show your most recent dataset (or part of it) to your colleague and ask their opinion of its quality (exchanging datasets with a colleague makes this activity more fun).
- Use criteria for good data (e.g., completeness, accuracy, fitness for use, documentation) to assess where your data stands.
- Discuss your approaches to data collection and measures you took / could take to ensure integrity and completeness of your data.
- Discuss steps to address missing or incomplete data in the context of your research. Does it matter? How much missing data affects validity, reliability or trustworthiness of your conclusions?
- Check out the Calling Bull**** Course Syllabus (e.g., Food Stamp Fraud or the Musician Mortality Case Study) What can we learn about data quality from these stories?