Build your First Persistent Staging tables
In this lesson, you will learn how to define a Staging table. You will also learn how to define the Extract from either a Source table or query.
Dimodelo Data Warehouse Studio implements 4 Extract patterns.
- Full Extract. The data from the source table/query is extracted in full. Dimodelo runs an internal change data capture routine to identify what data was added, updated or deleted and then applies that changeset to the target persistent staging table.
- Incremental Pattern. Extracts only the data that has been added or updated since the last time the extract ran for this source. E.g. you can use this pattern to extract only transactions from a large transaction table that have been inserted or updated since the last time the extract was executed. For the pattern to work, you need to identify a column or multiple columns that the source table uses to track change, like a modified date or sequence number.
- Date Range Pattern. The date range pattern is used to extract a subset of the source data based on a date range. The end of the date range is the current batch execution date. The start of the date range is a number of days specified prior to the current batch effective date. All data within the date range is extracted, regardless of wether, it has changed since the last time the extract ran.
- Change Tracking Pattern. The change tracking pattern uses the change tracking feature of SQL Server to identify changes (insert, update, delete) in a source database and extract only the data that changed since the last time the extract ran. It only supports SQL Server table sources (not queries or views). To use the change tracking pattern, it’s necessary to turn change tracking on for the source SQL Server table.
- Multi Wild Card File Pattern. This pattern is specific to files. The developer defines a file name pattern (with wild cards), and Dimodelo Data Warehouse Studio will extract the data from all files, that match that naming pattern, into the staging table. Dimodelo Data Warehouse Studio will also manage the files, archiving them in an archive folder.
To create a Staging table, you must first define a Connection Manager and Source System of the Source table(s) in the Dimodelo Data Warehouse Studio project. A Staging table can only stage data from a single Source System but can stage data from multiple tables in that Source system, although this is not a recommended approach (according to Kimball et al) in most cases. If you do need to combine data from more than one Source System in the Staging database, to perhaps aggregate or allocate, then you can you a Transform, which is covered in another lesson.