Modeling Many to Many Relationships in a Star Schema
Many to Many relationships are defined in a Star Schema Dimensional model using a “Group” Dimension and a “Bridge” Fact. Take the example below:
In the above example, an insurance company receives medical claims. The medical claims are the business event represented by the Fact table. Each medical claim can have more than one associated diagnosis. These groups of associated diagnoses are modelled in a “Group” dimension. Individual diagnoses are associated with the “Group” Dimension via a “Bridge” fact. The “Bridge” fact makes the association between the “Group” and individual diagnosis.
In this scenario, the most challenging thing to build is the “Group”. It doesn’t exist in your source and must be derived. To do this, you need to select all combinations of diagnoses that exist in medical claims. The source of the target fact will usually be the source of your “Group” as well. For example, Let’s say you had these groups of diagnoses that are used on claims.
- A,B
- C,D
- A,D
- B,E
Then, there would be one row in the Group Dimension for each Group. In fact, the concatenation of diagnosis keys is an excellent candidate to use as the business key of the Dimension, i.e. DiagnosisGroup_Id. You may also want to include a concatenation of the diagnosis names as an attribute if you want to analyze the impact of certain diagnosis combinations. For example, by choosing the “C,D” “Group” Dimension member, an analyst can quickly get the claims measures for claims with both C and D, but only C & D diagnoses.
The Bridge is simply another Fact, usually a Fact-less Fact, that just records the existence of individual diagnoses within the group. However, it is possible to put measures on this Bridge. For example, a ratio of the contribution of each individual Diagnosis in the Group to the overall value of the claim. A measure on a bridge is usually some ratio between the individuals. Another example is the ratio of ownership of joint buyers on a Sale.
A Group isn’t always derived, as shown in the example below:
In the example above, a Bank Account has joint Account Holders (Customers). There is no need to derive the Group as it exists as an entity in your source.
Another example might be if you had a Fact that represented the process of a Customer viewing a show on a streaming service. A show is related to many Actors, but the Group is readily available in the source system as a Cast entity that already groups the Actors. In this case, the “Cast” entity can be used as the source of the “Group” Dimension.
Why are Many to Many relationships modelled this way? One reason is that OLAP Cubes and other semantic layer software recognize this format and faithfully produce the correct aggregations when analyzing the Fact by the Individual Dimension. For example, if you want to sum the Medical Claim amounts for a given Diagnosis, you would select the Diagnosis from the Diagnosis Dimension, drag in your Claim Amount measure, and the semantic software would navigate the Bridge and Group relationship to produce the Total Claim Amount for that one Diagnosis.