Sunday, October 4, 2009

Data Analysis/Validation

Data is the heart of all Business Intelligence projects, but it is often the most missed step in many Business Intelligence projects. Most often the data analysis/validation steps are preformed in the testing phase after the design and build are already completed. Often when the dashboards and reports are compared to their existing reports there are differences between the values on the two reports. Hence, a long, detail data analysis/validation process starts which often cause the project time lines be project budget to be exceeded.
A better approach is to start the data analysis/validation in the project requirements gathering phase. Once access is provided to the application database, it is very beneficial to write queries against the source system to validate the measures on the existing reports. Also by writing queries against the source system the data calculation can be checked and validated. Many of the existing user’s reports may have been built by a resource that is no longer with the organization. Additionally be doing the data analysis/validation early in the project you can get a feel for the data quality. Small data quality errors in a transaction system will be magnified many times in the data warehouse. A small error on one record will greatly be magnified when you are looking at millions of records.

There is often an argument that there is no time in the project requirements phase to perform the data analysis/validation steps. However, if it is not performed in the early phases of the project the time and cost to perform the data analysis/validation are more expensive. There is an old project truism that puts data analysis/validation in perspective: if it is preformed in the requirements phase is cost $1, if it is performed in the design phase it costs $10, If it is performed in the build phase it costs $100, if it is performed in the testing phase it cost $500, and if it is done in the production phase it costs $1000. From this perspective it is more effective and cost efficient to perform the data analysis/validation in the early part of the project where the cost and time constraints and less costly.