Data Warehouse Architecture Patterns

Data Warehousing Architecture Patterns

The choice of where and how to store the data for the data warehousing system is a critical architectural question. Part of the issue is the data warehousing architecture pattern, which is explained in Table 1. Data warehousing patterns include:



  • Independent Data Marts
  • Coordinated Data Marts
  • Centralized Data Warehouse
  • Hub and Spoke
  • Federated / Logical Data Warehouse
  • Big Data / Data Lake.

Table 1 DW Architecture Patterns

Pattern Description
Independent Data Marts Multiple databases containing analytic data are created and maintained by different organizational units. The databases tend to be inconsistent with each other, having different dimensions, measures and semantics. There is “no single version of the truth” and it is a challenge to perform cross data mart analysis. These weaknesses make the independent data mart the least desirable of the architecture patterns.
Coordinated Data Marts Data is harmonized across multiple analytic databases or areas of databases. Each database meets a specific need such as customer segmentation or product analysis. There are shared dimensions and semantics between the databases. For example, customer and product dimensions are consistent across the databases. Consistent dimensions and semantics support a logical enterprise view.
Centralized Data Warehouse An Enterprise Data Warehouse (EDW) contains atomic data, summarized data and dimensional data. This is provided through a single database. Logical data marts are built from this database to support specific business requirements. Analysis between subject areas is facilitated by data being stored in a single database.
Hub and Spoke The hub and spoke architecture is similar to the centralized data warehouse architecture except that there are physical data marts instead of logical data marts. A cost and complexity element is added due the need to copy data from the central data warehouse to the distributed data marts.
Federated / Logical Data Warehouse The federated/logical data warehouse makes data that is distributed across multiple databases look to users like a single data source.  Data may be stored in traditional relational databases as well as Hadoop or NoSQL formats. Data Virtualization software is used to create a logical view of the data and to optimize query of data across databases.

Common keys such as customer and product identifiers are used to tie the data together. This approach reduces the expense and time needed for data movement. The approach may have a high overhead and slow response time because queries between data sources in multiple locations can be inefficient.

Big Data / Data Lake The big data/data lake architecture stores data in the Hadoop environment which is an economical and flexible way to store large volumes of data including unstructured data.

Some organizations are shifting away from traditional relational databases and moving to big data environments.

Data Model Patterns for Data Warehousing

A data model is a graphical view of data created for analysis and design purposes. While architecture does not include designing data warehouse databases in detail, it does include defining principles and patterns for modeling specialized parts of the data warehouse system.

Areas that require specialized patterns are:

  • Staging / landing area – looks like source systems
  • Atomic Data Warehouse – uses normalized ERD (Entity Relationship Diagram)
  • Data mart – uses dimensional modeling – the star schema, the snowflake schema or the cube.

In addition to these specialized patterns, the architecture should include other pattern descriptions for:

  • Naming of tables and columns
  • Assignment of keys
  • Indexing
  • Relational Integrity (RI)
  • Audit history.

Project Management for Profitable Analytics – Part 1

thumb_project_schedule_wpclipart_600x435In this blog tutorial series, you will learn about the management of successful data warehousing / business intelligence projects. Topics include:

  • Defining Scope and Objectives
  • Finding the Right Sponsor
  • Producing the Project Roadmap and Plans
  • Organizing the Team
  • Executing the Plan
  • Finishing the Project
  • Avoiding Major Data Warehouse Mistakes.

Defining Scope and Objectives

Scope specifies the boundaries of the project. It tells what is in and what is out. The scope definition started in the business case will be expanded, if needed, when the project is underway. This effort includes:

  • Overview of the project (Mission, Scope, Goals, Objectives, Benefits)
  • Scope plan
  • Scope definition
  • Alternative development.

Defining the correct scope and setting realistic objectives are critical to any project’s success, and a data warehouse project is no exception. Scope defines project boundaries including:

  • Business requirements addressed
  • Anticipated/planned users
  • Subject Areas such as inventory transactions or customer service interactions
  • Project success criteria, including quantified planned benefits.

Defining an overly large project scope and letting scope grow in an uncontrolled fashion (scope creep) are certain to cause project failure. Remember you cannot please everyone:

I cannot give you a formula for success,

but I can give you a formula for failure: try to please everybody.

Herbert Bayard Swope

Enterprise vs. Departmental Focus

The choice of Enterprise Data Warehouse vs. Departmental Data Mart is critical to the success of data warehousing projects. This choice is a major component of project scope. Examples of factors that arise with each focus, based on my experience, are shown in Table 1.

Table 1: Enterprise vs. Departmental Focus

Factor Enterprise Focus Department / Functional Focus
Organizational Scope Enterprise Wide Business Unit or Business Process Focused
Time to Build Multi-year phased effort Single Year effort
Sponsorship Required Executive Sponsor Management Sponsor
Complexity High Medium
Typical Cost Often a multimillion dollar effort Often less than $1 million effort

The project may require both an Enterprise Data Warehouse and one or more Data Marts. The future Technical Architecture blog article will explain more about this choice.