Upflows
Data warehouses provide an Upflow of summaries rolled-up from the detailed data.
Averages and summaries are pre-calculated and stored as a separate unit to
answer common queries raised in decision-making.
Companies offering products to perform ETL (Extraction, Transformation, and Loading) include:
- Ascential, acquired by IBM
- Oracle Warehouse Builder (OWB), which take advantage of Oracle database set-based and row-based operations, PL/SQL bulk processing and table functions, foreign key constraint manipulation, use of inline views to speed loading, partition exchange loading, external table support, multi-table insert, merge, direct path insert, and parallel operations.
-
Sunopsis (acquired by Oracle in late 2006) favors the "E-LT" approach.
- Informatica COGNOS
These tools are faster and less error prone than manual scripting because they generate SQL code
based on metadata about changing sources or targets.
NOTE: The OWB repository can be exported as a metadata loader (.mdl) file type.
ETL tools also provide a visual record of the process that can be adjusted when sources or targets change.
A variant of ETL is ELT (Extract, Load, and then Transform).
Outflows
The greatest value (payback) from a Data warehouse is in the way it provide quick and flexible
Outflows
of transformed data that leads its users to actionable insights
useful to forecasting and planning decisions.
This is why such systems are called "Business Intelligence".
Zyga consultants say:
Data Warehousing with OLAP tools also enables companies to manage by exception.
Managers today are deluged with status reports on company operations.
Often this information either comes too late or does not require managers to take any corrective action.
By using threshold analysis and intelligent agents
that trigger exception alarms,
a Data Warehouse provides managers timely access to only the critical information
they require in order to take action.
The heart of a data warehouse is its planning and analysis applications
called On-Line Analytical Processing (OLAP). The term "MOLAP" for Multi-dimensional OLAP is also used
because, unlike OLTP entity-relationship models consisting of two-dimensional tables,
data warehouses use a multi-dimensional model for storing data.
Information in a Data Warehouse is organized into various dimensions.
For example,
- a sales analysis database is organized by product, time, territory, and other dimensions.
- an invoice database could use time, customer, product, and supplier dimensions.
- A flat dimension have dimension members who are
equivalent (such as a Category of Actual, Budget, Forecast, What-if).
- A Hierarchical dimension have dimension members
that are parts of a whole, such as PreTaxIncome and Taxes
totaling to NetIncome.
The most common basis for summarization are the dimensions:
Measures allow changes to the view of data, such as
Periodic, Week To Date (WTD), Month To Date (MTD), Quarter to date (QTD), Year to date (YTD).
Daily transactions are aggregated to provide consolidated weekly or a monthly comparisons viewed using a Calendar interface.
Dimensions such as
Service or Product and their properties
(such as COLOR and SIZE) can totaled into a hierarchy of BRAND, MANUFACTURER, CATEGORY, or other aggregate.
Currency (InputCurrency and RptCurrency).
Location (Geography) each retail store's data is summarized into CITY, REGION, STATE, or COUNTRY levels of granularity.
To group applications such as Finance, Sales, etc. which can share common dimensions,
SAP's Business Planning and Consolidation (BPC) application uses the concept of Application Sets, equivalent to a single MS Analysis Services database.
Each dimension, such as time, can be structured in a hierarchy of consolidation levels
-- years, quarters, months, weeks, individual days, or other level of data granularity.
But "day of the week" is an extended attribute.
The lower (finer, more detailed) the level of granularity available for analysis,
the more costly it is to store and process the data.
Other dimensions depend on business needs:
Customer Entity who purchased can totaled into a hierarchy of INDIVIDUALBUYER, MARKET CHANNEL, LIFESTAGE, or other aggregate.
Diagnosis / Need: (e.g., medical DCD codes).
Entity Organization providing the server/product,
which can be totaled into a hierarchy of
DEPARTMENT, STORE, DISTRICT, DIVISION, CORPORATION, or other grouping.
Supplier
Account (Income Statement, Balance Sheet, Cash Flow, KPI)
Product Inventory
Product Returns
Event
DataSrc (for Data Source).
IntCo Inter-Company Eliminations
At the center of the data model, measures (numeric attributes such as sales dollars,
Invoice Amount, etc.) are stored in a fact table.
To make access information multi-dimensionally, fact tables also contain several
foreign keys used to join facts to several dimension tables.
Dimension tables organize and index the data stored in a fact table.
A visual representation of this connection between fact tables and dimension tables appears as a star.
|
|
Larry Greenfield's
Data Warehousing Information Center is the ultimate portal on
data warehousing, decision support, and data mining. Included is a "rant and rave" on
the definition of data warehousing.
Neil Raden offers
Star Schema 101,
Data, Data Everywhere a white paper excerpted in the
October 30, 1995 issue of Information Week on selecting the OLAP technology, and
ocsqlug-unsubscribe@yahoogroups.com
Ralph Kimball, Data Warehouse Consultant.
Chuo-Han's Data Wharehousing resource site
The Data Warehousing Institute.
Data Warehousing.com.
Data Warehousing.org.
Data-Warehouse.com.
Microsoft's Technet
Directory of Data Warehouse, Data Mining, and Decision Support Resources
Technology Guides for Data Warehousing Professionals
Seth Grimes'
OLAP Pages has links to resources specifically related to On-Line Analytical Processing (OLAP).
TeraCLIN provides the application of star schemas to health care.
High-end databases:
- Mercury from
Business Objects
- PowerPlay from Cognos
- PaBLO from Andyne
- SAS
- Microsoft
High-end database engines:
- Express from Oracle,
- Acumate ES from Kenan,
- Gentium from Planning Sciences
- Holos from Holistic Systems
MDD engines:
- Brio
- Fusion from Information Builders
- Essbase "hypercubes"
- Seagate Software
- LightShip Server from D&B/Pilot
- TM/1 from Sinper
|
|