Data Warehouse Automation in Visual Studio 2015
Dimodelo Data Warehouse Studio is ‘Data Warehouse Automation’ development tool that targets Microsoft SQL Server. Simply design a data warehouse within Dimodelo Data Warehouse Studio and it will generate the SQL code to maintain your Staging/Data Warehouse databases, and SSIS packages. Dimodelo Data Warehouse Studio will manage the deployment of code to multiple server environments, and manage the ETL batch runs. Dimodelo Data Warehouse Studio is a plug-in to Visual Studio 2015. It’s familiar to developers and allows you to keep your entire solution in one place (i.e. Dimodelo Data Warehouse Studio, SSAS, SSRS projects side by side in the same Visual Studio Solution).
Data Warehouse Designer
Designing a Data Warehouse is as simple as drag and drop. Dimodelo Data Warehouse Studio’s visual design wizard allows you to design your Star Schema. As you Drag and Drop, Dimodelo automatically determines relationships in your source system and suggests Dimension tables. Dimodelo allows you to refine your design before the design Meta Data is generated. Step through detail pages to define included columns, types, relationships and ETL patterns, and then generate the design. Once complete, you can further refine the design by opening each Staging, Dimension or Fact table through the Solution Explorer.
Rapidly design staging tables by importing table and column schema information from source connections. Support for any type of Source data:
- Database – SQL Server, Oracle, DB2, MySQL etc.
- ODBC – SharePoint, Dynamics, Salesforce, Quickbooks, Hadoop etc etc etc.
- File – Delimited files, Excel.
- XML.
Extract Patterns: Dimodelo is “Pattern” driven. Simply chose the Pattern you want to a apply to the Extract, and the appropriate code is generated.
- Full Extract.
- Incremental (Delta) Extract.
- Date Range ( Last N Days) Extract.
- File Extract. Imports files that match a file name pattern and manages those files.
Advanced Features:
- Define ColumnStore indexes.
- Define Custom indexes.
- Define multiple extracts for a staging table (from multiple sources) and append data.
- Tables are created within Schema in the database. Configure the schema of each entity type.
Views
- Create Views in the Staging layer. Use those Views as the source of Dimensions and Facts.
- Select either a table, or a query as the source of the extract.
- Define complex source queries.
- Auto map Source to Target columns.
- Use SQL expressions in mappings.
- Define either Full, Incremental, Multi File or Date range extracts.
- Handle complex extract logic (allocation, aggregation etc) using derived staging tables (i.e. staging tables derived from other staging tables).
- Define multiple extracts for a staging table (from multiple sources) and append data.
- Import the column schema for a Dimension from a source system table or view, including column names and data types.
- Define business keys and attributes for the Dimension including attribute descriptions, data types, slowly changing types, parent child hierarchies etc.
- Further customize your dimension by defining custom surrogate key names and data types, and surrogate key generation rules. Dimodelo Data Warehouse Studio supports smart keys (like you might use in a calendar dimension) where the surrogate key is set from a column in the Staging source.
- Add custom meta data for more complex requirements or custom code generation.
- Add descriptions to the dimension and attributes. Descriptions are included in the generated documentation.
- Define the Dimension as a ColumnStore index.
- Define Custom indexes.
- Use an expression to reject rows in staging to enforce data quality.
- Import the schema for a Fact from a source system table, including name, data types.
- Identify the business keys and add/delete/modify measures.
- Add your own descriptions and references back to requirements documentation. Descriptions are included in generated documentation.
- Add custom meta data for more complex design requirements.
- Define the Fact table as a ColumnStore index.
- Define Custom indexes.
- Use an expression to reject rows in staging to enforce data quality.
- Select the Dimensions to associate to a Fact.
- A Fact can be associated to the same Dimensions multiple times to easily define role plays.
- Associating a Fact with a Dimension means the Fact table is generated with a foreign key to the Dimension, and the generated ETL looks up the appropriate Dimension member for each Fact row.
- Define the mapping between staging source columns and target Dimension attributes or Fact measures.
- Use one to one column mappings or Expressions.
- Auto match based on name.
- Under the advanced tab you can define the mapping between a staging column and a custom ‘smart’ surrogate key.
Generate and Deploy
No coding required. Dimodelo will generate all the code to create and maintain the Staging and Data Warehouse databases, and ETL SSIS packages. SQL Server 2008, 2012, 2014 and 2016 SSIS packages can be generated. Generation is executed from the development environment or via MSBUILD. Dimodelo Data Warehouse Studio generates:
- Staging Tables DDL synchronization.
- Data Warehouse Tables DDL synchronization.
- Extract, Transform and Load (ETL) SSIS Packages.
- Data Warehouse Documentation.
Dimodelo makes maintaining a Data Warehouse far easier. Simply change a column, data type, mapping, expression etc., and re generate. Your change is propagated throughout the code. [/et_pb_blurb][/et_pb_column][/et_pb_row][et_pb_row admin_label=”row”][et_pb_column type=”1_2″][et_pb_blurb admin_label=”Blurb” title=”Generate – SSIS Package Generation” url_new_window=”off” use_icon=”off” icon_color=”#7EBEC5″ use_circle=”off” circle_color=”#7EBEC5″ use_circle_border=”off” circle_border_color=”#7EBEC5″ icon_placement=”top” animation=”top” background_layout=”light” text_orientation=”left” use_icon_font_size=”off” use_border_color=”off” border_color=”#ffffff” border_style=”solid” header_font_size=”18″ body_font_size=”14″] Dimodelo Data Warehouse Studio generates high performance, readable and industry best practice SSIS Packages to implement the ETL for SQL Server 2008, 2012, 2014 and 2016. Our SSIS package patterns are battle hardened and real world tested.
- We use techniques to ensure high performance, including only performing bulk updates, checksums, early elimination, and little known query hints for bulk inserts.
- Real world testing has shown performance to be approx 1 million rows per minute or > 16000 rows per second on large fact tables with heavy insert load.
- Our patterns address every conceivable ETL scenario including things like soft deletions, reinstatement, and late arriving dimensions.
It is important that data is not destroyed when the Data Warehouse schema is updated. Other existing schema synchronization tools implement a column name change as a drop and create column pair destroying data in the existing column as they do so. Dimodelo advanced DDL generation methods preserves data, recognizes name changes, and generates dynamic scripts which test the existing state of your data warehouse, creating and modifies tables and columns as required. Dimodelo uses soft deletes to delete columns and tables, so data is never deleted unless you explicitly do so. [ Dimodelo Data Warehouse Studio makes code deployment simple. One click is all it takes to deploy your Data Warehouse and ETL code from the UI. You can choose to deploy Data Warehouse Code, ETL Code or both. Through project configurations you can deploy to multiple environments, with prominent warnings if you are deploying to a non development environment. You can also use MSBUILD to automate deployments via TFS. Dimodelo Data Warehouse Studio will even deploy custom code with the solution. Dimodelo eases the pain of migrating code though multiple Development, Test and Production environments. Through configuration files, you can define different connections string and other variables for each environment. You can also automate Data Warehouse deployment to multiple environments using MSBUILD and the Dimodelo Data Warehouse Studio compiler.
ETL batch execution, logging and management
Dimodelo Data Warehouse Studio records ETL batch task execution information to an SQL database. Dimodelo Data Warehouse Studio comes with some pre-configured PowerBI desktop dashboards (you don’t need PowerBI.com) that you can use to monitor and analyze you ETL batch processes. Dashboards include:
- Latest batch run. Typically you would look at this dashboard every morning to determine the success of the overnight batch.
- The Batch query dashboard. Allows you to query details of any batch by date and type of workflow.
- Batch Performance dashboard. Allows you to analyze the trend in batch duration over time, and the average duration of phases and tasks over all batches.
- The batch database captures table statistics on inserts/updates to each table on each batch execution.
No more tiresome master SSIS packages to maintain every time a new SSIS package is added to the ETL batch. Dimodelo Data Warehouse Studio uses a simple XML file to define the phases and tasks within the ETL batch workflow. Tasks within phases run in parallel. Dimodelo Management Console uses the workflow file to orchestrate batch execution. Task names are wild-carded, so newly generated SSIS packages are automatically included in the batch. You can define multiple workflows, for your regular daily, weekly, monthly ETL jobs. Workflows allow you to run SSIS packages and stored procedures. You can even have your own custom code execute as part of the workflow. Dimodelo Management Console is a server side Batch Execution application which executes and manages ETL batches on a Server. Dimodelo Management Console will saves having to develop your own ETL Batch execution and logging system. Dimodelo Management Console orchestrates ETL tasks on your Microsoft SQL Server, scheduling tasks in the correct order (based on a workflow file), and managing status updates and failures. The Batch execution framework uses an SQL Server database to record batch progress. Dimodelo Manager will:
- Execute the ETL in order, according to the workflow defined in a workflow file.
- Employ dynamic Server utilization. Choose the number of tasks (SSIS Packages) to execute concurrently according to your server capacity.
- Log task status.
- Gather Batch statistics.
- Handle multiple batch schedules i.e. Daily, Weekly, Monthly.
Dimodelo Management console can be scheduled to run via either an SQL Server Agent Job, or Windows Task Scheduler.