OracleBIBlog Search

Friday, October 9, 2009

Data Quality

Data Quality:


How often after the implementation of a Business Intelligence (BI) Project, have you heard that the business users do not feel the data is reliable, credible, and consistent to meet their analysis and reporting needs. Unfortunately that is too often the response from a client after a BI Project is implemented. As pointed out by Ralph Kimball in his book “The Data Warehouse Toolkit”, the business community must accept the data warehouse if it is deemed to be successful. The other goals of the data warehouse are:

1. The data warehouse must make an organization’s information easily accessible
2. The data warehouse must present the organization’s information consistently
3. The data warehouse must be adaptive and resilient to change
4. The data warehouse must be a secure bastion that protects our information
5. The data warehouse must serve as the foundation for improved decision making

He further states that you can have the most technical sound data warehouse; but if the business community does not accept the data warehouse as adding value it is a failed program. One of the major reasons that a data warehouse is not accepted by the business community is the perception of poor quality of the data in the data warehouse that the user accesses. There are many reason why the data quality is perceived as poor, but one of the ways to discover the quality of the data is to conduct a data analysis in the early phase of the project – I discussed this briefly last week.

So where does data quality begin. Many point out the Database Administrator or the Information Technology staff as the cause of poor data quality. However, since the data is a corporate asset the responsibility for the data quality belongs to the whole organization. This is starting to be recognized by many corporations as Master Data Management and Customer Data Integration processes have been started within some organizations. Two of the most prominent causes of poor data quality were:

1. Movement of centralized data system to distributed data systems
2. Poor implantation of purchased package data systems
3. Silo implementation of purchased package data systems
4. Lack of data edits for imputing data into data systems
5. Lack of having a system of record for corporate entities
6. Not viewing data entities from a corporate perspective

So what can be done in the short term to help corporations implement Business Intelligence until they can get a Master Data Management and Data quality processes implemented . Some of the short term steps that can be done:

1. Begin data analysis process early in the program to help determine the quality and consistently of the data
2. Work with the business users to find short term solutions to the data quality issues
3. Work with the business users to determine data edits for the data elements
4. Work with the business users to determine the definition and calculation of major data metrics
5. Work with the business users to determine a system of record for the major entities
6. Work with the business users to determine major data hierarchies
7. Work with the business users to determine an acceptable level of data quality for the project within the current BI Program
8. Work with the business users to develop a data repository for a corporate definition of data elements and metrics
9. Work with the business users to determine acceptable level of analysis and reporting requirements
10. Keep the business users involved in all phases of the project development lifecycle

Data quality is an issue facing all data warehouse and Business Intelligence Programs. It should be addressed early in the BI Program, and be resolved in the short term with input from the business users. The business users are the ones that have to perceive that BI Program as adding values otherwise it will not be successful. They have to be involved in all phases of the BI Program to understand the data issues and develop short term solutions to the data quality and definition problems until a corporate Master Data Management and Data Quality Program can be implemented.

2 comments:

Paula said...

Simply put, I'd have to say that the number one reason why any BI implementation comes under criticism is because the information produced does not match the expected results.

In cases where the data is reliable, it is because careers may hinge on perpetuating comfortable myths. For example, an Atlantic City casino's business development management group may have advised the marketing department to invest the enterprise's capital on luxury weekend hotel promotional programs for out-of-town high roller gamblers, ...only to discover from the BI reports produced, that the casino's core revenue actually comes from the daily small stakes gambling done by retirees living 35 miles or less from the casino Monday through Thursday.

In cases where the data is unsound, it is because the stewards of the most reliable sources were not involved at the design phase of the data gathering process.

In the first case, the politics that surround the project may well kill the BI project--in the second case, once the more reliable source data is discovered, a project plan is usually put into place to first reconcile the data then assure going forward...that the most reliable sources will be used for populating the repository in the future.

It is rare these days that I actually see poorly crafted conversion processes like I did when BI implementations first came into play.

Joseph Naujokas said...

I think this issue is directly related to validation - and the importance of a thoughtfully-constructed auditing "dashboard" that compares warehouse data against source data and thus unequivocally establishes the validity of the warehouse data to all users on a real-time basis.