What
is Data Warehousing?
In computing, a data warehouse (DW,
DWH), or an enterprise data warehouse (EDW), is a database used for
reporting and data analysis. Integrating data from one or more
disparate sources creates a central repository of data, a data
warehouse (DW). Data warehouses store current and historical data
and are used for creating trending reports for senior management
reporting such as annual and quarterly comparisons.
The data stored in the warehouse is
uploaded from the operational systems (such as marketing, sales,
etc., shown in the figure to the right). The data may pass through an
operational data store for additional operations before it is used in
the DW for reporting.
The typical extract-transform-load
(ETL)-based data warehouse uses staging, data integration, and access
layers to house its key functions. The staging layer or staging
database stores raw data extracted from each of the disparate source
data systems. The integration layer integrates the disparate data
sets by transforming the data from the staging layer often storing
this transformed data in an operational data store (ODS) database.
The integrated data are then moved to yet another database, often
called the data warehouse database, where the data is arranged into
hierarchical groups often called dimensions and into facts and
aggregate facts. The combination of facts and dimensions is sometimes
called a star schema. The access layer helps users retrieve data.
A data warehouse constructed from
integrated data source systems does not require ETL, staging
databases, or operational data store databases. The integrated data
source systems may be considered to be a part of a distributed
operational data store layer. Data federation methods or data
virtualization methods may be used to access the distributed
integrated source data systems to consolidate and aggregate data
directly into the data warehouse database tables. Unlike the
ETL-based data warehouse, the integrated source data systems and the
data warehouse are all integrated since there is no transformation
of dimensional or reference data. This integrated data warehouse
architecture supports
the drill down from the aggregate data
of the data warehouse to the transactional data of the integrated
source data systems.
A data mart is a small data warehouse
focused on a specific area of interest. Data warehouses can be
subdivided into data marts for improved performance and ease of use
within that area. Alternatively, an organization can create one or
more data marts as first steps towards a larger and more complex
enterprise data warehouse.
No comments:
Post a Comment