What is a Persistent Layer?
The persistent layer contains as set of persistent tables that record the full history of changes to the data of the table/query that is the source of the Persistent table.
The source could a source table/file, a source query, another staging table or a view/materialized view in the transform layer.
In a persistent table there may be multiple row versions for each row found in the source. Each row version has an effective date and end date marking the date range of when that row version was effective (or in existence).
Technically speaking a persistent table is a bi-temporal table. A bi-temporal table permit queries over two timelines: valid time and transaction time. Valid time is the time when a row is effective (i.e row effective and end date-time). Transaction time denotes the time when the row version was recorded in the database. The persistent table supports transaction time by tagging each row version with an inserted and updated batch execution Id. The batch execution is associated with a date time in the batch database. Note that the Dimodelo Data Warehouse Studio version of the bi-temporal table goes one step further by identifying a last updated transaction date time also.
How is a Persistent layer different to an ODS, Inmon EDW, Data Vault?
A Persistent Layer has some similarities to all of these techniques, in that, it is a precursor to a Dimensional layer. It could be described as a temporal ODS. However there are some key differences:
- The temporal nature of a Persistent layer is different. A Data Vault will contain some temporal information, but not effective dates. The other techniques don’t traditionally hold history.
- An ODS and EDW, and to a certain extent, a Data Vault, are designed to be queried by end users. A Persistent layer’s primary purpose is to support the higher layers in the Data Warehouse. You could query a Persistent layer directly, but, because of the temporal nature of the data, it becomes difficult to write the temporal joins. Luckily Dimodelo also set a Is_Latest flag for every row version, so it’s easy to query current state,and thus emulate an ODS.
- The extent to which these layers are modeled is different. The EDW and Data Vault techniques expect to be extensively modeled. This takes considerable effort on the behalf of the developer, first to model the layer, and second to transform the data from the source into the new model. In a persistent layer, you can model this layer as much or as little as you like. Again its purpose is support higher layers. If you need, you can create entities representing aggregations, allocations and new relationships etc in this layer. You might do this to support the Dimensional layer, or for a specific reporting purpose. We do recommend that the persistent layer be at least organised, into source systems, or entity domains, or both.