Data Sources for the Data Warehouse
The information provided by the data warehouse and business
intelligence is only as good as its inputs. Finding,
understanding, selecting and improving data sources are
critical to the success of any data warehousing project.
A leading cause of data warehousing project failures
is a lack of understanding of data sources and poor data
quality of data sources. We recommend gaining an
understanding of the data by use of data profiling tools and
the improvement of its data quality through data cleansing
approaches.
The article Data Sources for Data
Warehousing and Business Intelligence provides further
information.
Data Warehouse Data Extracts
|
|
The data warehousing extract process
pulls data out of data sources so that is
available for later transformation and then
load into the data warehouse and other
databases. Architectural choices include
choice of extract tool and timing of
extracts.
Data warehouse extracts are further
explained the article ETL -
Extract Transform Load for Data Warehousing and
Business Intelligence.
|
Database Selection and Physical Storage for Data
Warehouse
The choice of where and how to store the data for the
data warehousing system is a critical architectural
question. Part of the issue is data warehousing
"style":
- Enterprise data warehouse
- Basic data warehouse system
- Specialized data warehouse / data marts
- Federated data warehouse
- Virtual data warehouse
- ROLAP vs MOLAP
The basic data warehouse system calls for the creation of
the following types of databases:
- Data source staging (Kimball's "Landing Area")
- Data warehouse (Kimbal's "Backroom")
- Data marts (Kimball's "Frontroom")
The article, Database Choices
for Data Warehousing and Business Intelligence, describes
each of these alternatives in detail.
Data Model Patterns for Data Warehousing
A data model is a graphical view of data created for
analysis and design purposes. While architecture does
not include designing data warehouse database in detail,
it does include defining principles and patterns modeling
specialized parts of the data warehouse system.
Areas that require specialized patterns are:
- Staging / Landing Area - looks like source system
- Data warehouse / Backroom - uses normalized ERD
- Data Mart / Frontroom - uses dimension modeling - the
ROLAP star schema or the MOLAP cube
In addition to these specialized patterns, the architecture
should include other pattern descriptions for:
- Naming of tables and columns
- Assignment of keys
- Indexing
- Relational Integrity (RI)
- Audit history
These patterns are described in the article
Data Models for Data Warehousing and Business
Intelligence in greater detail.
Mapping, Transforming, Enriching, and Loading Data
Warehousing Data
After data has been extracted and the physical storage areas
created, it is time to pump the data through the data
warehousing system - from data sources to staging to data
warehouse to data mart to BI query to the business
user.
These key activities are needed to support this
process:
- Mapping - Data sources are aligned with data
targets. We decide what data goes where.
- Transforming - Data is modified to meet
requirements.
- Enriching - Additional data may be added such as
geocoding.
- Loading - Data is inserted into databases
The article ETL - Extract Transform
Load for Data Warehousing and Business
Intelligence provides further information on this
subject.
Business Intelligence Analysis and Presentation
Business
Intelligence is the part of the data warehousing system where
business users analyze the data and prepare
presentations. The
data warehouse architecture must provide for the needs of the
business people who will access the system.
Business
people are likely to act like farmers who harvest a crop of
known information or explorers who are seeking new patterns.
Both types of access must be supported.
Data
warehouse architecture includes the selection and use of the
following types of tools:
- Query
Tools
- Reporting
Tools
- Fixed
Reports
- Analysis
Tools
- Data
Mining Tools
The
article BI -
Business Intelligence supplies further information about
each tool category.
Managing, Operating, and Securing the Data Warehouse
To achieve benefits from business intelligence,
it is important to manage the data warehouse to make sure that
it continues to provide value and avoids risk of loss.
This includes activities like:
- Obtaining feedback from BI users
- Assuring the quality of data
- Monitoring the performance of the data warehouse
- Securing the data warehouse from threats
- Enabling governance of the overall system
The management of data warehousing is further described in
the article Operations.
Data Warehousing Architecture Roadmaps
Data warehousing architecture helps to answer critical
questions:
- What is the current state of data warehousing and
business intelligence?
- What is the desired future state of data warehousing
and business intelligence?
- What are the high level steps that must be taken to
move from the current state to the future state?
Roadmaps identify actions that must be taken to achieve the
desired future state. In addition, roadmaps specify
intermediate future states that must be passed through to
achieve the future state.
|