Perform complex transformations to prepare data for the Dimensional Layer
A Dimodelo project contains a Transform layer where Views and Materialized Views can be created to perform more complex ETL in preparation for loading into the Dimensional layer.
The Views/Materialized Views are complex SQL queries over data on in the Staging layer. It’s useful for creating aggregations, or allocations etc of data prior to the transform and load process into the Dimensional layer.
- Materialized View. A Materialized View is a table created during the execution of the ETL and populated with the result of the defined query. It is dropped and re-created every time the ETL runs. On deployment, the Materialized View won’t exist. It is only created during ETL execution.
- View. A view is created as a view in the database. There is no ETL, it is just a simple view that can be utilized by the Dimensional layer as a source. Be careful about how views are used. If more than 1 view is used as a source of a Dimension or Fact, it can badly affect the performance. We recommend Materialized views in most cases unless you know that the View will be the single source of a Dimension/Fact.
Note: When you use a View or Materialized View as one of the sources of a Dimension/Fact Transform, then the pattern to load the Dimension/Fact, generated by Dimodelo, changes. Instead of being incremental, and only operating on the data that changed, a full comparison between the View/Materialized View and the Dimension/Fact is executed to determine what changed, before loading the changes to the Dimension/Fact. This is because Dimodelo can’t guarantee that the data presented by the View/Materialized View is just the delta changeset. In-fact most views/Materialized views are not written that way.
Create a new “TimeSheetBalance” Transform Materialized View
- Expand the Transform folder, right-click the child Transform folder and click Add New.
- Select “Materialized View” in the “Table Type” dropdown.
- In the Staging Table Name field type “TimeSheetBalance“. The name is used to define the name of the Materialized View table in the Staging database. The table name in the Staging database will be Schema.Source System Abbreviation_Staging Table Name. i.e. tfm.Stage_TimeSheetBalance.
- Enter the “This is the TimesSheetBalance transform materialized view” Description in the Description field. This is optional. The Description is used as an overview for the table in the generated Documentation.
Columns Tab
- Add a MonthStartDate column with a date datatype. Tick Business Key.
- Add a TotalHours column with a decimal(9,2) datatype.
View Tab
- Click Add to add a query for the Materialized View.
- On the Extract dialog, Source tab, add the following query:
SELECT *
FROM
(
SELECT
MIN ( CAST(Date AS DATE)) OVER (PARTITION BY CAST(YEAR(Date) AS VARCHAR(4)) + CAST(MONTH(DAte) AS VARCHAR(2))) AS MonthStartDate,
SUM ([Hours]) OVER (PARTITION BY CAST(YEAR(Date) AS VARCHAR(4)) + CAST(MONTH(DAte) AS VARCHAR(2))) AS Total,
ROW_NUMBER() OVER (PARTITION BY CAST(YEAR(Date) AS VARCHAR(4)) + CAST(MONTH(DAte) AS VARCHAR(2)) ORDER BY CAST(YEAR(Date) AS VARCHAR(4)) + CAST(MONTH(DAte) AS VARCHAR(2))) AS RowNum
FROM psg.[SV_TimeSheet]
) S
WHERE RowNum = 1
This query sums up the timesheet monthly total hours and associates it with the first day of each of the months (ready for a calendar dimension lookup).
- Click Mappings. Map columns as you would with any other staging table. Expressions are not supported.
- Click Ok.
- Click Save.
TroubleShoot
- I can’t Map columns, and my query is invalid.
You can’t save a Transform until the staging tables it uses have been generated and deployed to the staging database. Validation will fail.
- I used a View in a Dimension/Fact transform, and it’s really slow.
Be careful about how Views are used. If more than 1 view is used as a source of a Dimension or Fact, it can badly affect the performance. We recommend Materialized views in most cases unless you know that the View will be the single source of a Dimension/Fact.