ETL Tool Selection for Data Warehousing
Learn How To Select ETL Tools Your Data Warehouse
The selection of tools for ETL is key to the successful. At
a high level there is a choice between programmer written
applications and off the shelf tools. There is also the
selection of Commercial Off The Shelf (COTS) versus Open
Source. There is also the option of using "Tools At
Hand".
Consideration factors in tool selection include:
- Business requirements
- Security requirements
- Data latency requirements
- Complexity
- Data volume
- Staff skill sets
- Standards
- Existing tools
- Technology of data sources and data targets
- Data quality level of data sources
- Budget
- Schedule
|
|
|
ETL tool functions include:
- Extract Utilities
- DB Utilities
- Extract Manager
- Controls How and When Extracts Are
Run
- Records Activity
- Reports on Activity
- Extract Builder
- Metadata Manager
- Code Generator
|
Programmer Written ETL Applications
Programmer written applications may be useful because they
provide capabilities beyond standard tools. They are good
for complex logic that off the shelf solutions may not provide.
In addition, existing development staff is often skilled
with development languages such as: SQL, COBOL, C#, Java
and PERL. The following chart shows some of the pros and
cons of this approach.
|
Factor
|
Pro
|
Con
|
| Budget |
Additional software licensing
fees are avoided. |
More development labor may be
required. |
| Metadata |
|
Metadata is not automatically
included. |
| Performance |
Standard languages have low
overhead. |
Standard languages may not
support parallelism in processing like COTS ETL
tools. |
| Skill
Sets |
Current staff tends to have
skills in language use. |
Data warehousing requires
specialized techniques for populating facts and
dimensions that staff must
learn. |
| Solution
Complexity |
New tools are not introduced
to the environment. |
Total solution may consist of
many "moving parts" which must be managed and
maintained. |
| Vendor
Relations |
Standard languages are
available from multiple vendors, avoiding
vendor lock in. |
Vendor support is not
available. |
Data Warehousing ETL Tools At Hand
Your organization may already have tools which could be used
to perform ETL functions. Desktop tools like Microsoft
Access and Excel include data movement functionality.
Database software like Microsoft SQL Server and Oracle include
ETL functionality competitive with COTS Dedicated ETL
Tools.
|
Vendor
|
Software
|
Description
|
| Microsoft |
Access |
Provides an import and
export capability for both flat files and
databases. Jobs can be built using
macros. Data can be transformed using SQL
and Visual Basic scripting language.
Suitable for small scale systems or proof of
value demonstrations. |
| Microsoft |
SSIS |
SSIS is included as
part of SQL Server 2005 and 2008. It has
features competitive with COTS Dedicated ETL
Tools. Many file types and database types
are supported. |
Fast load utilities are included with many databases:
|
Vendor
|
Software
|
Description
|
| IBM |
DB2
Bulk |
Provides for rapid
loading of IBM DB2
databases. |
| Microsoft |
SQL Server
BCP |
Provides for rapid loading
of Microsoft SQL
Server databases. |
| Oracle |
Oracle Data
Pump |
Provides for rapid loading and
unloading of Oracle databases. |
| Sybase |
BCP |
Provides for rapid loading
of SYBASE Adaptive Server databases. |
COTS Dedicated ETL Tools
If you are part of a large organization with a big budget,
Commercial Off the Shelf (COTS) software is probably the way to
go for the bulk of ETL work. Here are some toolsets that
you should consider:
|
Vendor
|
Software
|
Description
|
| IBM |
DataStage |
Multiplatform software
enables visual specification of ETL.
Integrates with suite of products including
data quality and metadata management.
Supports numerous data sources, including IBM
mainframe legacy
databases. |
| Informatica |
PowerCenter |
Flexible software
supports numerous data sources and
targets. Design through point and click
visual approach. Many transformations are
built in. Supports both batch and
real time. Large product family with many
options and components. |
| SAS |
ETL
Studio |
Graphical software
enables development of ETL applications.
Over 300 transformation types along with
numerous source and target
types. |
Open Source ETL
Open Source ETL tools have been steadily improving and are
in use by many organizations, large and small. The tools
are graphical and metadata driven. They tend to be a bit
simpler than the COTS Dedicated ETL Tools. The base
systems are available without licensing fees. Support and
other services are available for a fee.
|
Vendor
|
Software
|
Description
|
| Jitterbit |
Jitterbit |
Graphical systems supports
real-time and batch integration. Strong
support for web services and XML. User
and vendor community supplies JitterPaks,
specialized interfaces to systems such as
salesforce.com and SugarCRM. |
| Pentaho |
Data Integrator |
Visual development of ETL.
Supports many database and file formats.
Integrates with other products in Pentaho
suite. |
| Talen |
Open Studio |
Combines graphical design with a
metadata driven approach. This product
and approach are very flexible. |
|