Content is adapted from the Love Your Data Website.
Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data.
Things to consider
What is data quality and how can we distinguish between good and bad data? How are the issues of data quality being addressed in various disciplines?
- Most straightforward definition of data quality is that data quality is the quality of content (values) in one’s dataset. For example, if a dataset contains names and addresses of customers, all names and addresses have to be recorded (data is complete), they have to correspond to the actual names and addresses (data is accurate), and all records are up-to-date (data is current).
- Most common characteristics of data quality include completeness, validity, consistency, timeliness and accuracy. Additionally, data has to be useful (fit for purpose) and documented and reproducible / verifiable.
- At least four activities impact the quality of data: modeling the world (deciding what to collect and how), collecting or generating data, storage/access, and formatting / transformation
- Assessing data quality requires disciplinary knowledge and is time-consuming.
- Data quality issues: how to measure, how to track lineage of data (provenance), when data is “good enough”, what happens when data is mixed and triangulated (esp. high quality and low quality data), crowdsourcing for quality.
- Data quality is responsibility of both data providers and data curators: data providers ensure the quality of their individual datasets, while curators help the community with consistency, coverage and metadata.
Read more about data quality or join the conversation on twitter and facebook using #LYD17 or #loveyourdata
Have questions? Contact Jill Krefft, Institutional Repository Coordinator at email@example.com