Notes and Study Materials

The Data Warehouse and Data Mart

 

 

Data Warehouse is “an integrated, subject-oriented, time-variant, nonvolatile collection of data that provides support for decision making.” To understand that definition, let’s take a more detailed look at its components.

 

• Integrated:

 

The data warehouse is a centralized, consolidated database that integrates data derived from the entire organization and from multiple sources with diverse formats. Data integration implies that all business entities, data elements, data characteristics, and business metrics are described in the same way throughout the enterprise.

You May Also Like:

Business Intelligence and Its Architecture
Operational Data and Decision Support Data
OLAP and Its Characteristics
Relational OLAP

 

Although this requirement sounds logical, you would be amazed to discover how many different measurements for “sales performance” can exist within an organization; the same scenario holds true for any other business element. For instance, the status of an order might be indicated with text labels such as “open,” “received,” “canceled,” and “closed” in one department and as “1,” “2,” “3,” and “4” in another department.

 

• Subject-Oriented:

 

Data warehouse data are arranged and optimized to provide answers to questions coming from diverse functional areas within a company. Data warehouse data are organized and summarized by topic, such as sales, marketing, finance, distribution, and transportation. For each topic, the data warehouse contains specific subjects of interest—products, customers, departments, regions, promotions, and so on. This form of data organization is quite different from the more functional or process-oriented organization of typical transaction systems.

 

 

For example, an invoicing system designer concentrates on designing normalized data structures (relational tables) to support the business process by storing invoice components in two tables: INVOICE and INVLINE. In contrast, the data warehouse has a subject orientation. Data warehouse designers focus specifically on the data rather than on the processes that modify the data.

 

• Time-Variant

In contrast to operational data, which focus on current transactions, warehouse data represent the flow of data through time. The data warehouse can even contain projected data generated through statistical and other models. It is also time-variant in the sense that once data are periodically uploaded to the data warehouse, all time- dependent aggregations are recomputed. For example, when data for previous weekly sales are uploaded to the data warehouse, the weekly, monthly, yearly, and other time-dependent aggregates for products, customers, stores, and other variables are also updated.

 

• Non-Volatile

Once data enter the data warehouse, they are never removed. Because the data in the warehouse represent the company’s history, the operational data, representing the near-term history, are always added to it. Because data are never deleted and new data are continually added, the data warehouse is always growing. That’s why the DBMS must be able to support multi gigabyte and even multi-terabyte or greater databases, operating on multiprocessor hardware.

 

Data Mart:

Although the centralized and integrated data warehouse can be a very attractive proposition that yields many benefits, managers may be reluctant to embrace this strategy. Creating a data warehouse requires time, money, and considerable managerial effort. Therefore, it is not surprising that many companies begin their foray into data warehousing by focusing on more manageable data sets that are targeted to meet the special needs of small groups within the organization. These smaller data stores are called data marts.

 

 

A data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people. In addition, a data mart could also be created from data extracted from a larger data warehouse with the specific function to support faster data access to a target group or function. That is, data marts and data warehouses can coexist within business intelligence environment.

• Some organizations choose to implement data marts not only because of the lower cost and shorter implementation time but also because of the current technological advances and inevitable “people issues” that make data marts attractive.

• Data marts can serve as a test vehicle for companies exploring the potential benefits of data warehouses.

 

• Information technology (IT) departments also benefit from this approach because their personnel have the opportunity to learn the issues and develop the skills required to create a data warehouse.

The only difference between a data mart and a data warehouse is the size and scope of the problem being solved. Therefore, the problem definitions and data requirements are essentially the same for both. To be useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and to support decision making.

You May Also Like:

Star Schema and Its Components

Issues In Data Warehouse Implimentation     

Data Mining

Find Other DBMS Questions