Consider a typical data movement. A user logs into the system A and uses some input forms to create it. Then the data is distributed to several consuming systems – system B, system C,.. system N via ETLs, web services or custom data export/import solutions. Some checks are implemented on the way when the data is moving from System A to consuming systems. Everything is working smoothly until it isn’t. Bad data starts creeping into consuming systems & applications.
Enter data quality! A team is convened to address these quality issues. Quality definitions are made and data rules are implemented. Data quality profiling job are set up on system A and fixes to data are made either automatically or manually. This approach works in the short term till the data quality problems show up again.
If you have worked in data quality space for a long time (as we have), you will notice some issues with the approach. Who determined the quality definitions? How are the data rules agreed upon? Who are the consumers? What is their view on quality? The last question is the most important – consumers point of view on data quality. Often we see this question missing from the data quality projects. As in above example, if system A provides all the necessary data fields, say 4, to consuming system B, then quality is 100%. However, if the system C is getting only 2 of its required 4 fields, then quality could be at 50%. So what is the quality of information in system A? It depends. Consumer defines the quality of information they receive. It is of great quality if it fits their need.
As a data manger or data steward for system A, one would want to satisfy data quality requirements of all the consuming systems. And to do that, we need to define and document the consumer requirements on quality. These definitions could be grouped up with the rest of the definitions that a system should satisfy and can implemented at the same time. In addition to the overall system level quality metrics/ dashboards that the data stewards /managers have access to, dashboards and reports that track the consumer’s quality requirements should also be developed and made available. These dashboards and reports make it easier for consumers to grade the quality of the information and help improve it for collective benefit. It might be a little additional effort, but the benefits are huge.