Data Warehousing and Business Intelligence at the Data Management Center 

Data Warehousing Tutorial

Learn Data Warehousing and Business Intelligence.
Jump start your data warehousing efforts.
Bookmark and Share

 
<< Previous    [1]  2    Next >>

Data Sources for Data Warehousing

Learn How to Manage Data Sources for Input to Data Warehousing and Business Intelligence

David Haertzen David Haertzen, Principal Enterprise Architect


A data warehousing and business intelligence effort is only as good as the data that is put into it. The saying "Garbage In, Garbage Out" is all too true. A leading cause of data warehousing and business intelligence project failures is to obtain the wrong or poor quality data.

This article will provide an efficient and effective workflow for obtaining the right source data and using it in the data warehousing and business intelligence project.

Managing data warehouse input sources includes a number of steps organized into two phases. In the first phase the following activities are undertaken:

  • Manage the Data Source Identification Process
  • Identify Subject Matter Experts (SMEs)
  • Identify Dimension Data Sources
  • Identify Fact Data Sources

When the major data sources have been identified it is time to quickly gain detailed understanding of each one:

  • Obtain Existing Documentation
  • Model and Define the Input
  • Profile the Input
  • Improve Data Quality
  • Save Results for Further Reuse

Manage the Data Warehousing Data Source Identification Process

The source identification process is critical to the success of data warehousing and business intelligence projects. It is important to move through this effort quickly, obtaining enough information about the data sources without being bogged down in excess detail while still obtaining the needed information.

Start out with a list of the entities planned for the data warehouse / data mart. This can be managed with a spreadsheet containing these columns:

  • Entity name
  • Data mart role (Fact, Dimension, Bridge, etc.)
  • Subject Area
  • Data Source(s)
  • Analyst Name(s)
  • Subject Matter Expert(s)
  • Status

Data Warehousing Data Source Entity Plan

Complete the entity name, data mart role and subject area entries. Assign an analyst to each entity who will find data sources and subject matter experts for each entity.

Identify Data Warehousing Data Source Subject Matter Experts

Consider the following questions when determining the sources and costs of data for the Data Warehouse:

  • Where does the data come from?
  • What processes are used to obtain the data?
  • What does it cost to obtain the data?
  • What does it cost to store the data?
  • What does it cost to maintain the data?

Identify Dimension Data Sources for the Data Mart

Dimensions enable business intelligence users to put information in context  They focus on questions of: who, when, where and what.  Typical dimensions include:

  • Time period / calendar
  • Product
  • Customer
  • Household
  • Market Segment
  • Geographic Area

Master data is a complementary concept and may provide the best source of dimensional data for the data warehouse.  Master data is data shared between systems that describes entities like: product, customer and household.  Master data is managed using a Master Data Management (MDM) system and stored in an MDM-Hub.  Benefits of this approach include:

  • It is less expensive to access data from a single source (MDM-Hub) than extracting from multiple sources.
  • MDM data is rationalized.
  • MDM data is of high quality

If an MDM-Hub does not exist consider creating one.  It will have many uses beyond supporting the data warehouse and business intelligence.

If no MDM-Hub is available, you will need to examine source systems and determine which system contains the data most suitable for dimensions.  If the data is not stored in a managed database, you may need to define the data locally, in a spreadsheet or desktop database, and then provide to the data warehousing system.

Identify Fact Data Sources for the Data Mart

The Fact contains quantitative measurements while the Dimension contains classification information.  The data sources for Fact tends to be transactional software systems.  For example:

System

Example Fact Data

Sales Order Entry
  • Sales Transaction
  • Return Transaction
Customer Service
  • Service Episode
  • Service Result
Accounts Payable
  • Payment Transaction
Sales Campaign
  • Sales Campaign Event

Larger enterprises may have multiple systems for the same kind data.   In that case, you will need to determine the best source of data - the System of Record (SOR) as the source of data warehousing data.

<< Previous    [1]  2    Next >>