Wednesday, September 16, 2015

Data Governance : Data quality measures

Data Quality (DQ) is a niche area required for the integrity of the data management by covering gaps of data issues. This is one of the key functions that aid data governance by monitoring data to find exceptions undiscovered by current data management operations. Data Quality checks may be defined at attribute level to have full control on its remediation steps.

We can have following nine measures/matrices to know quality of any data source.
These measures can be applied irrespective of any tool and technology as these measures are applicable as basic required principles and measures to ensure data quality.

Accuracy -
The degree to which data is consistent with authoritative sources of the truth (e.g. Customer ID must conform to an authorized government-issued document or database). Metric/results will be % of Accuracy, Failure Count.

Completeness -
The degree to which data is required to be populated with a value (e.g., A Customer ID is required for all customers but not prospects). Metric/results will be % of Failure, Failure Count

Comprehensiveness -
The degree to which all expected records are contained in a data store. Metric/results will be % of Comprehensiveness Ratio (records found vs. records expected)

Coverage -
The degree to which data is inclusive of all supported business functions required to produce a comprehensive view for a specific business purpose (e.g., Average Revenue per User reporting for the enterprise should include revenue data from all business areas where revenue is generated). Metric/results will be % of Data Sources Available

Integrity -
The degree to which data retains consistent content across data stores (e.g. Customer ID contains the same value for a Customer across databases). Metric/results will be % of Different, Count of Differences

Logic/Reasonableness -
The degree to which data confirms to tests of reasonableness based on real-world scenarios (e.g., A policy/account holder’s birth date must prove that they are at least 13 years old). Metric/results will be % of Failure, Failure Count

Timeliness -
The degree to which data is consistent with the most recent business event (e.g., Customer ID must be updated within all systems within XX hours of a change made to a Customer record). Metric/results will be % of Failure, Failure Count

Uniqueness -
The degree to which data can be duplicated (e.g., Two non-related customers cannot have the same Customer ID/Party ID.). Metric/results will be % of Duplicated, Duplicate Count

Validity -
The degree to which data conforms to defined business rules for acceptable content (e.g., Customer ID must be 10 characters long). Metric/results will be % of Failure, Failure Count