The choice of where and how to store the data for the data warehousing system is a critical architectural question. Part of the issue is the data warehousing architecture pattern, which is explained in Table 1. Data warehousing patterns include:
- Independent Data Marts
- Coordinated Data Marts
- Centralized Data Warehouse
- Hub and Spoke
Table 1 DW Architecture Patterns
|Independent Data Marts||Multiple databases containing analytic data are created and maintained by different organizational units. The databases tend to be inconsistent with each other, having different dimensions, measures and semantics. There is “no single version of the truth” and it is a challenge to perform cross data mart analysis. These weaknesses make the independent data mart the least desirable of the architecture patterns.|
|Coordinated Data Marts||Data is harmonized across multiple analytic databases or areas of databases. Each database meets a specific need such as customer segmentation or product analysis. There are shared dimensions and semantics between the databases. For example, customer and product dimensions are consistent across the databases. Consistent dimensions and semantics support a logical enterprise view.|
|Centralized Data Warehouse||An Enterprise Data Warehouse (EDW) contains atomic data, summarized data and dimensional data. This is provided through a single database. Logical data marts are built from this database to support specific business requirements. Analysis between subject areas is facilitated by data being stored in a single database.|
|Hub and Spoke||The hub and spoke architecture is similar to the centralized data warehouse architecture except that there are physical data marts instead of logical data marts. A cost and complexity element is added due the need to copy data from the central data warehouse to the distributed data marts.|
|Federated||The federated data warehouse architecture makes data that is distributed across multiple databases look to users like a single data source. Common keys such as customer and product identifiers are used to tie the data together. This approach reduces the expense and time needed for data movement. The approach may have a high overhead and slow response time because queries between data sources in multiple locations can be inefficient.|
Data Model Patterns for Data Warehousing
A data model is a graphical view of data created for analysis and design purposes. While architecture does not include designing data warehouse databases in detail, it does include defining principles and patterns for modeling specialized parts of the data warehouse system.
Areas that require specialized patterns are:
- Staging / landing area – looks like source systems
- Atomic Data Warehouse – uses normalized ERD (Entity Relationship Diagram)
- Data mart – uses dimensional modeling – the star schema, the snowflake schema or the cube.
In addition to these specialized patterns, the architecture should include other pattern descriptions for:
- Naming of tables and columns
- Assignment of keys
- Relational Integrity (RI)
- Audit history.
- Defining Scope and Objectives
- Finding the Right Sponsor
- Producing the Project Roadmap and Plans
- Organizing the Team
- Executing the Plan
- Finishing the Project
- Avoiding Major Data Warehouse Mistakes.
Scope specifies the boundaries of the project. It tells what is in and what is out. The scope definition started in the business case will be expanded, if needed, when the project is underway. This effort includes:
- Overview of the project (Mission, Scope, Goals, Objectives, Benefits)
- Scope plan
- Scope definition
- Alternative development.
Defining the correct scope and setting realistic objectives are critical to any project’s success, and a data warehouse project is no exception. Scope defines project boundaries including:
- Business requirements addressed
- Anticipated/planned users
- Subject Areas such as inventory transactions or customer service interactions
- Project success criteria, including quantified planned benefits.
Defining an overly large project scope and letting scope grow in an uncontrolled fashion (scope creep) are certain to cause project failure. Remember you cannot please everyone:
I cannot give you a formula for success,
but I can give you a formula for failure: try to please everybody.
Herbert Bayard Swope
The choice of Enterprise Data Warehouse vs. Departmental Data Mart is critical to the success of data warehousing projects. This choice is a major component of project scope. Examples of factors that arise with each focus, based on my experience, are shown in Table 1.
Table 1: Enterprise vs. Departmental Focus
|Factor||Enterprise Focus||Department / Functional Focus|
|Organizational Scope||Enterprise Wide||Business Unit or Business Process Focused|
|Time to Build||Multi-year phased effort||Single Year effort|
|Sponsorship Required||Executive Sponsor||Management Sponsor|
|Typical Cost||Often a multimillion dollar effort||Often less than $1 million effort|
The project may require both an Enterprise Data Warehouse and one or more Data Marts. The future Technical Architecture blog article will explain more about this choice.