Data Warehouse Architecture
Right Choices for Data Warehousing and Business
Intelligence Success
Technical architecture is all about making the right choices
for the data warehousing and business intelligence
effort. This article will help you to set the foundation
for the successful data warehouse.
According to IEEE standard 1471-2000, "Software
architecture is the fundamental organization of a system,
embodied in its components, their relationships to each other
and the environment, and the principles governing its design
and evolution".
Data warehousing technical architecture includes:
|
|
- Functional and Non-functional
Requirements
- Architectural Principles
- Buy, Build or Re-Use
- Metadata
- Data Sources
- Extracting
- Physical Storage and Operation
- Data Model Patterns
- Mapping, Transforming, Enriching, and
Loading
- Analyzing and Presenting
- Managing, Operating, and Securing
- Architecture Roadmaps
|
This is a large topic, so there are many references to
supporting data warehousing and business intelligence articles
at this and other websites.
At another level, data warehousing architecture builds on
the classic system pattern: input, process and output:

Functional and Non-functional Data Warehousing
Requirements
The recommendation "Begin with the end in mind" is very true
for data warehousing and business intelligence. The end
that we have in mind is a system that satisfies both functional
and non-functional requirements, that is, a system that does
what it is supposed to do.
Functional requirements (business requirements) are needs
identified by the business relating to data and business
processes. For example, we may be looking for a system
that provides information about customers, territories and
products and that supports business processes of selling and
customer support. See the article, Requirements for Data Warehousing and
Business Intelligence, for guidance on how to gather and
organize business requirements.
Non-functional requirements are needs about performance and
IT chosen practices. Performance includes issues
such as required system availability and recoverability.
It depends on the volume of data and number of users expected
for the data warehouse. IT chosen practices include
selected technologies (the "tech stack") and
standards.
Data Warehousing Architectural Principles
Data warehousing architectural
principles, largely complementing enterprise architecture
set a framework for decision making. One set of
architectural principles are the "ilities":
- Flexibility - systems should adaptable to changing
conditons
- Scalability - systems should be expandable
Some additional principles might include:
- Re-use before buy and buy before build
- Develop in manageable steps.
- Don't boil the ocean
Buy, Build or Re-Use Data Warehouse Components?
A critical question to data warehousing efforts, is
how to obtain the resources that make up the data warehousing
system. There are three options:
- Re-use existing resources
- Buy a new resource
- Build a resource
Re-using existing resources can often save money and
deliver a superior and more maintainable solution. If we
buy or build new components every time that there is a new
project, then the portfolio of resources will soon become
bloated and expensive to maintain. Re-use can have
drawbacks. Existing resources may not meet current
function or non-functional requirements.
Buying a new data warehousing resource can save time and
money over building a resource. Buying is a good choice
when products are available for a price less than
building and meet a large percentage of
requirements. Purchased software may have more features
and fewer problems than home grown software for example.
Building a resource can be a good answer when there are no
existing resources to re-use and purchased resources that meet
requirements are not available for a reasonable price.
Building a solution or part of a solution can result in a
competitive advantage where your organization has a capability
that is not readily duplicated by competitors. Cost is
also a factor. Purchased software often has a per user or
per computer charge while in-house developed software can be
made available to internal users without additional licensing
fees.
In general, we recommend "Re-use before buy and buy before
build". Some combination is likely. Create a list
of needed resources and specifying the type of sourcing for
each item.
Metadata for Data Warehousing and Business
Intelligence
Metadata is often defined as "data about data".
In practice, data warehousing metadata is any data that
describes or controls the system that is not procedural
programming code. Examples of metadata include:
- Data definitions
- Data models
- Data mapping specifications
Defining data once through metadata and then re-using those
data definitions can save much development and support
time while resulting in more consistent data warehousing
solutions.
Metadata is typically created in tools such as the data
modeling tool and the ETL tool. It may then be stored in
metadata repository that manages and coordinates this
information.
See the article, Metadata for
Data Warehousing and Business Intelligence, for further
insights.
|