Data Vault 2.0 is an advancement to Data Vault 1.0 and extends this standard by a reference architecture and a common, agile methodology. In addition, it provides best practices for the (automated) implementation of the ETL processes.
Dörffler & Partner GmbH offers Data Vault Consulting and Implementation Services.
The next sections present the “pillars” of Data Vault 2.0. If you have further questions, please contact us via DataVault@doerffler.com.
DATA VAULT 2.0 ARCHITECTURE
The Data Vault 2.0 architecture addresses typical requirements for an enterprise data warehouse:
- a highly flexible data model which allow evolutionary growth
- a three-layer architecture (Raw Vault, Business Vault, Information Marts)
- a strict separation of loading and business logic
- early integration of the data using the business keys
- historization in the Raw Data Vault
- provision of multiple information marts to address the needs of different information consumers
- iterative addition of new data sources and business rules to meet new and changing requirements
- easy deployment of (virtual) Raw Marts to interact early with information consumers
- managed self-service BI
- option to virtualize Business Vault and marts
- support for heterogenous and unstructured data sources, including NoSQL (und NewSQL)
- support for real-time data (from SOA/ESB) without changing the data warehouse architecture
- highly scalable architecture / appropriate for MPP and VLDW (very large data warehouses)
The Raw Vault (lowest layer of the data warehouse) integrates the data using the business keys and historizes the data. Data processing (such as aggregation or recalculation) or the application of complex business rules or data corrections, is only performed when the data moves into the Business Vault or the interpretation layer (data access layer). It is possible to add new marts easily. Typically, these marts are created using a dimensional model (star schema), but relational models in third normal form can be used as well to support relational reporting or data mining.
DATA VAULT 2.0 METHODOLOGY
Many teams like to have an agile methodology that is tailored to the unique needs of data warehouse development. Experiences show that pure agile approaches from software development, such as Scrum, is only sub-optimal in data warehouse projects. Dan Linstedt addresses these needs with the introduction of a common methodology. The Data Vault 2.0 methodology unites concepts from Scrum, CMMI, traditional software development life-cycles, total quality management (TQM), Six Sigma, PMP, and function point analysis (FPA).
Similar to the agile practices based on Scrum, the goal is to deploy functionality to business users within a small number of sprints. For that reason, the data warehouse is not built layer-by-layer in a horizontal fashion. Instead, new functionality is deployed vertically through all layers. By using Data Vault 2.0 modeling, development teams are able to extend the data warehouse easily, a core requirement for agile development.
DATA VAULT 2.0 MODELING
A core requirement for any agile methodology, such as the Data Vault 2.0 methodology, is the ability to extend the existing data model by new functionality. The Data Vault 2.0 model is the key for this requirement. It separates business keys (hubs), links between business keys (links) and context information (satellites) into their own entities. The model is extremely flexible and offers many advantages. Due to its support for evolutionary growth, extensions to the model are easy withouth affecting existing functionality.