Once upon a time, there was a magical kingdom called Data Land. It was filled with all sorts of crazy creatures who spoke in numbers and code. The kingdom was ruled by the wise Queen Metadata, who had a serious obsession with keeping things tidy and accurate.
You see, Data Land was basically one gigantic warehouse stocked up with mountains of data. And Queen Metadata knew that if the warehouse was a total mess, her data-hungry subjects would go crazy trying to find what they needed. So she established some strict rules and frameworks to keep everything in pristine condition.
The first rule was all about the "VICE" framework. It stated that for any new data being brought into the warehouse, it had to be Validated, ensuring it followed all the right formats and rules. Then it needed to be Integrated with all the existing data sets in a consistent way. Any Contradictions or conflicts had to be caught and resolved. And finally, everything was run through rigorous Error-handling checks.
But VICE was just the start. The Queen also rolled out the "DAMA" framework, which focused on ensuring the quality of data already living in the warehouse. The D stood for Deduplication - getting rid of any redundant copies lying around. A was for Accuracy, double checking that values were correct and up-to-date. M stood for Monitoring, putting systems in place to continuously measure and report on quality. And A was all about having clear Audit trails to track where data came from.
As if that wasn't enough, the Queen was also a stickler for the "PDCA" cycle. This meant data quality efforts had to constantly Plan by defining goals and processes. Then Do by implementing those processes. Check by evaluating outcomes. And Act by identifying improvements for the next cycle.
Thanks to all these quirky frameworks, Data Land became famous for having the most pristine, high-quality data warehouse around. And Queen Metadata's subjects lived happily data-munching ever after!
Let's break down those quirky data quality frameworks from the Data Land story:
VICE Framework:
V - Validation This focuses on ensuring any new data entering the kingdom (aka your systems) follows all the defined rules, formats, and constraints. Things like checking for null/missing values, verifying data types, applying business rules etc.
I - Integration After validating, the new data needs to be properly integrated and made consistent with existing data sets. This could involve transformation steps, deduplication, applying master data rules and so on.
C - Contradiction/Conflict Detection With new data being integrated, you need mechanisms to identify and resolve any contradictions or integrity violations across data sets. Catching conflicts early prevents bad data proliferation.
E - Error Handling Robust processes to handle any errors that emerge during validation, integration or loading stages. Ensuring errors are trapped, logged and remediated appropriately.
DAMA Framework:
D - Deduplication Eliminating redundant, duplicate records and data instances across the warehouse. Deduping prevents conflicts and inconsistencies.
A - Accuracy Verifying that data values are error-free, precise and faithfully represent reality. Accuracy checks, data lining and authentication processes.
M - Monitoring Implementing continuous data quality monitoring processes and checks across the warehouse to measure and report on quality metrics.
A - Audit Trail Maintaining audit trails to systematically track data lineage, origin, movement and all changes/updates applied to data entities over time.
PDCA Cycle:
P - Plan Define data quality goals, requirements, processes, roles and metrics to plan quality initiatives.
D - Do Execute and implement those defined processes, practices and assign ownership. The "doing" part.
C - Check Continuously evaluate outcomes to verify if processes are effective and goals are being met.
A - Act Based on evaluation, identify opportunities to course-correct processes and introduce improvements for the next cycle.
So in essence - VICE for new data, DAMA for existing data, and PDCA to ensure it's an ongoing cyclical effort! The Queen had her bases covered in Data Land.
Sash Barige
Oct-15-2023
Comentários