There is no doubting the fact that healthcare data is the new treasure trove, just waiting to be tapped into. The process of ETL helps data collected from myriad sources, transform it, and then load it to the target destination to make sense of it. From the target destination, the data can then be used for countless applications such as chronic care management, or for deriving meaningful insights into population health for risk mitigation, by payers for fraud detection, by payment providers for payments processing, by research organizations to be used as RWD (Real World Data) in clinical trials, and many more.
However, despite the numerous standards for data interchange and maturity in implementing these standards, the fact remains that the data coming from disparate systems continue to have interoperability challenges. Therefore, quality control becomes a vital part of the ETL process and determines the integrity and reliability of the data that would in most cases be used to make decisions concerning patient health. In this blog, I will discuss a few pointers to follow for a methodical Quality Assurance approach that would help guarantee the quality of the ETL process.
Before you start …
a). Gain a thorough understanding of the business requirements and how the current system is programmed to achieve the requirements.
b). Identify the project scope and outline the expected outcomes, thereby getting the data model and relevant business rules in place.
The ingredients for the “secret sauce”
Validation of data, checking its completeness, and compliance with business rules are the key elements of ETL testing. Analysis and validation of data during each phase of the ETL process are paramount to the success of the outcome.
Integration of enterprise healthcare data projects and expansion of technology capabilities has meant that the role of the ETL tester has also evolved. From my experience, here’s what would constitute the ingredients for a “secret sauce” for success in ETL testing:
Know the source: It’s critical to have a good knowledge of the source data and what happens to it during the ETL process. A clear understanding of mapping and transformations, as well as a full analysis of the state of source data, can prevent many problems.
The mapping document: The mapping document is the blueprint of the ETL solution. This determines how the data traverses from source to destination, providing an understanding of the business priorities and how the key data elements are linked to these priorities. It’s important for ETL testers to validate if the transformed rules (driven by the mapping document) are processed on the source data.
Automate. Automate. Automate: Since ETL testing involves large data sets that keep changing their form, manual testing becomes complex. Human intervention in monitoring large data sets can become erroneous and therefore, automation becomes important. Apart from the more obvious benefits of automation such as reduction in time spent, automating ETL tests allows frequent testing without needing manual intervention.
Create multiple test data sets: At every stage of the ETL pipeline, testers must create multiple sets for the full and incremental data load. It’s important to ensure that the data, which is loaded, has undergone the correct transformations – if this is not done, then the continuity will be disrupted, due to the cascading effect. Hence, at every stage, there should be testing done to make sure transformation has been achieved.
Clean bad data in the source system: When data is brought together from multiple sources, it’s natural for quality issues to occur. The ETL tester needs to diagnose these issues and work with data custodians to clean data at the source. Remember, your outcomes are only as good as the quality of your source data. However, in many cases, the source data is difficult to be controlled and as such, the quality checks must be performed when the raw data is received.
Data correctness is nothing without performance: Given the nature of ETL testing processes, it’s essential to ensure that multiple test cycles are executed for the given set of data to monitor the performance and understand where the bottlenecks are. It’s also crucial to optimize the scripts or upgrade systems whenever necessary.
Setting up the ETL process initially can be highly complex and ETL testing might seem daunting. However, with above ingredients, it will certainly help create that perfect sauce to make life easier and ensure high quality.
Armed with these pointers, ETL testers can go way beyond being just “gatekeepers” who regulate data through the ETL process, to doing much more to contribute to the overall ETL success.
Have something to say on ETL testing? Any best practices you would like to share? Please write to email@example.com
With inputs from Bhupesh Nadkarni, VP – Sales & Marketing