OracleBIBlog Search

Tuesday, November 11, 2008

Data Modeling Continued... Kimball vs. Inmon: The Basics

As I discussed last time, many in the field of BI are strongly sided with the methodology of either Ralph Kimball or Richard Inmon. As mentioned last week, there are more similarities than differences, but today I'll just point out the main differences between their philosophies for anyone unfamiliar with them.

The main difference is that Kimball's architecture, also known as the Bus Architecture, is based on loading individual data marts directly from the operational system through the data staging area using conformed dimensions. An operational data store or intermediate data structure may or may not be necessary depending on existing data sources and business requirements. In this design, what is referred to as the data warehouse is actually just the collection of data marts. Kimball's basic architecture is shown in the diagram to the left. Inmon argues that this approach is inflexible without a centralized warehouse and changes cannot be made as gracefully as with his approach, which is explained below.

Inmon's Corporate Information Factory, or CIF architecture, is based on the idea that a complete data warehouse should be created in third normal form. Data marts are then created separately using the warehouse as their source. These data marts can be denormalized as the designers see fit, often into a star schema. This architecture is depicted in the diagram below.Those in Kimball's camp argue that the design, implementation, and maintenance of this data warehouse, along with its associated additional ETL processes, are often unnecessary and take much more time to get off the ground than projects using the BUS archeticture.

The differences and arguments between these two approaches go far beyond what I've mentioned here, but this should help to explain the basic split between the methodologies. I've read many of the arguments for both sides out there, and although there are plenty of hard liners in both camps, the verdict seems to be that the answer to which architecture is better depends. Yes, boring I know, but I've read many comments by designers claiming that they have either used hybrids or, used both successfully at different times depending on the existing architecture and business requirements.

For every opinion I've read advocating one or the other, I read another praising the merits of both. I also read one claiming that Richard (not Ralph) Kimball's methodology is superior, which made me laugh, because I made the same mistake once in conversation shortly after learning his name. My colleagues somehow seemed skeptical that the fictional character from the movie "The Fugitive" has his own data warehouse methodology.

Hopefully this helped explain the main differences to anyone new to their methodologies. As I mentioned in the preceding post, I encourage anyone involved with a BI project at any level to pick up a book by both men to fully understand their ideas. Also, although this topic has already been argued at length, I encourage comments from anyone with significant experience using either or both methodologies.