UNIT – II
DATA WAREHOUSE
Data Warehouse:
A data warehouse is a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making process.
Subject-Oriented:
A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
Integrated:
A data warehouse integrates data from multiple data sources. For example,
source A and source B may have different ways of identifying a product, but in a
data warehouse, there will be only a single way of identifying a product.
Time-Variant:
Historical data is kept in a data warehouse. For example, one can retrieve
data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the most
recent data is kept. For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all addresses associated
with a customer.
Non-volatile:
Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
Data Warehouse Design Process:
A data warehouse can be built using a top-down approach, a bottom-up
approach, or a combination of both.
The top-down approach starts with the overall design and planning. It is
useful in cases where the technology is mature and well known, and where the
business problems that must be solved are clear and well understood.
The bottom-up approach starts with experiments and prototypes. This is
useful in the early stage of business modeling and technology development. It
, allows an organization to move forward at considerably less expense and to
evaluate the benefits of the technology before making significant commitments.
In the combined approach, an organization can exploit the planned and
strategic nature of the top-down approach while retaining the rapid implementation
and opportunistic application of the bottom-up approach.
The warehouse design process consists of the following steps:
Choose a business process to model, for example, orders, invoices,
shipments, inventory, account administration, sales, or the general ledger. If the
business process is organizational and involves multiple complex object
collections, a data warehouse model should be followed. However, if the process is
departmental and focuses on the analysis of one kind of business process, a data
mart model should be chosen.
Choose the grain of the business process. The grain is the fundamental,
atomic level of data to be represented in the fact table for this process, for example,
individual transactions, individual daily snapshots, and so on.
Choose the dimensions that will apply to each fact table record. Typical
dimensions are time, item, customer, supplier, warehouse, transaction type, and
status.
Choose the measures that will populate each fact table record. Typical
measures are numeric additive quantities like dollars sold and units sold.
A Three Tier Data Warehouse Architecture:
DATA WAREHOUSE
Data Warehouse:
A data warehouse is a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making process.
Subject-Oriented:
A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
Integrated:
A data warehouse integrates data from multiple data sources. For example,
source A and source B may have different ways of identifying a product, but in a
data warehouse, there will be only a single way of identifying a product.
Time-Variant:
Historical data is kept in a data warehouse. For example, one can retrieve
data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the most
recent data is kept. For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all addresses associated
with a customer.
Non-volatile:
Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
Data Warehouse Design Process:
A data warehouse can be built using a top-down approach, a bottom-up
approach, or a combination of both.
The top-down approach starts with the overall design and planning. It is
useful in cases where the technology is mature and well known, and where the
business problems that must be solved are clear and well understood.
The bottom-up approach starts with experiments and prototypes. This is
useful in the early stage of business modeling and technology development. It
, allows an organization to move forward at considerably less expense and to
evaluate the benefits of the technology before making significant commitments.
In the combined approach, an organization can exploit the planned and
strategic nature of the top-down approach while retaining the rapid implementation
and opportunistic application of the bottom-up approach.
The warehouse design process consists of the following steps:
Choose a business process to model, for example, orders, invoices,
shipments, inventory, account administration, sales, or the general ledger. If the
business process is organizational and involves multiple complex object
collections, a data warehouse model should be followed. However, if the process is
departmental and focuses on the analysis of one kind of business process, a data
mart model should be chosen.
Choose the grain of the business process. The grain is the fundamental,
atomic level of data to be represented in the fact table for this process, for example,
individual transactions, individual daily snapshots, and so on.
Choose the dimensions that will apply to each fact table record. Typical
dimensions are time, item, customer, supplier, warehouse, transaction type, and
status.
Choose the measures that will populate each fact table record. Typical
measures are numeric additive quantities like dollars sold and units sold.
A Three Tier Data Warehouse Architecture: