What is Dimensional Modeling (introduction)
Dimensional modeling is a data modeling technique used to model the presentation layer of a data warehouse. It focuses on delivering simplicity and query performance for the end user. It allows users to easily understand and navigate the data available for reporting and ad hoc analysis. A dimensional model supports high-performance aggregated queries and performs well over many different analytics use cases.
Dimensional modelling is broadly accepted across the industry. It was first popularized by Ralph Kimball. Ralph Kimball’s data warehouse methodology remains relevant, even today, with the advent of Data Lakes and Data Lakehouses. In the Kimball methodology, a dimensional model is delivered using a modelling technique known as the “star” schema.
The 3 essential building blocks of Dimensional Data Modeling
A dimensional data model consists of 2 types of tables (Facts and Dimensions) and their relationships. Thus it’s simplicity. Facts and Dimensions are related together and arranged in a “star” schema. The image below is an example star schema. It depicts an imaginary video streaming company, GetFlix, and a “Shows Viewed” fact table. The “Shows Viewed” fact table is at the centre of the Star Schema and surrounded by its Dimensions.
The following is a brief introduction to these concepts.
Facts tables
A Fact table represents a business process or an event within a business process. It contains the measures/numbers you want to analyse. For example, the “GetFlix” “Shows Viewed” fact table depicted in the star schema example above contains one row for every time a customer views an Episode of a Show. “Customer views an Episode of a Show” is the business event we are measuring.
Dimensions
A Dimension represents a business entity, which is the people, places, and things that come together to perform business activities. A Dimension contains the attributes (i.e. fields) of the business entity, e.g., Customer Gender, Age, Category, etc. The attributes are used to filter and group fact data when performing data warehousing queries.
Star Schema
A Star Schema refers to how Facts and Dimensions are related in a Data Warehouse. A Star Schema is organized around a central fact table that is related to its Dimension tables using foreign keys in the fact table. The name “star” schema refers to the star-like pattern that emerges when you present the Fact table and its Dimensions in an entity-relationship diagram. See the example above.
How to Build a Dimensional Data Model
Using the 3 essential dimensional data modeling building blocks discussed above a data warehouse modeler can build an entire data warehouse data model.
Very briefly, the process is:
- Identify the business process.
- Identify the grain (i.e. unique key) of the fact or fact(s) of the business process.
- Identify the Who, What, When, Where and Why are involved in the business process. The Dimensions of your star schema.
- Identify the sources of data. Write the code to extract the data into the data warehouse.
- Write code to transform the data into a form suitable to be loaded into the presentation layer of the data warehouse.
- Write code to load the data into the presentation layer of the data warehouse modeled as a star schema.
Obviously, there is a lot more to it than that. Data warehouse data modeling and architecture is a specialized field that benefits from training and years of experience. This is just a quick overview.
Star Schema vs Enterprise Data Warehouse Data Model
So far, we have discussed using a star schema. However, a star schema only describes a single Fact table. An Enterprise data warehouse should model many different business processes/events, each with its own Fact table and related Dimensions joined in a star schema.
The question is, how do we prevent duplication of the Dimensions used by Facts and facilitate cross-business process analysis?
The answer is “Conformed” Dimensions. Essentially, a conformed Dimension is a Dimension shared by many facts. Typical conformed Dimensions are key business entities like Employees, Customers, Products, Cost Centres, Dates, and Times.
When two Fact tables share a Dimension, you can produce a report with measures side-by-side from both Fact tables, categorized or filtered by attributes of their conformed (i.e. shared) Dimensions. In this way, dimensional modeling with conformed Dimensions supports enterprise-wide cross-business process analysis.
Another way of representing conformed dimensions is through a data warehouse matrix.