Data Warehousing Architecture

11/01/2021 1 By indiafreenotes

Data Warehouse Architecture is complex as it’s an information system that contains historical and commutative data from multiple sources. There are 3 approaches for constructing Data Warehouse layers: Single Tier, Two tier and Three tier.

The basic concept of a Data Warehouse is to facilitate a single version of truth for a company for decision making and forecasting. A Data warehouse is an information system that contains historical and commutative data from single or multiple sources. Data Warehouse Concepts simplify the reporting and analysis process of organizations.

A data-warehouse is a heterogeneous collection of different data sources organised under a unified schema. There are 2 approaches for constructing data-warehouse:

Top-down approach:

The essential components are discussed below:

  1. External Sources:
    External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well.
  2. Stage Area:
    Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into datawarehouse. For this purpose, it is recommended to use ETL

    • E(Extracted):Data is extracted from External data source.
    • T(Transform):Data is transformed into the standard format.
    • L(Load):Data is loaded into datawarehouse after transforming it into the standard format.
  3. Data-warehouse:
    After cleansing of data, it is stored in the data warehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts. Notethat data warehouse stores the data in its purest form in this top-down approach.
  4. Data Marts:
    Data mart is also a part of storage component. It stores the information of a particular function of an organisation which is handled by single authority. There can be as many number of data marts in an organisation depending upon the functions. We can also say that data mart contains subset of the data stored in datawarehouse.
  5. Data Mining:
    The practice of analysing the big data present in data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in data warehouse with the help of algorithm of data mining.

This approach is defined by Inmon as data warehouse as a central repository for the complete organisation and data marts are created from it after the complete data warehouse has been created.

Advantages of Top-Down Approach:

  1. Since the data marts are created from the data warehouse, provides consistent dimensional view of data marts.
  2. Also, this model is considered as the strongest model for business changes. That’s why, big organisations prefer to follow this approach.
  3. Creating data mart from data warehouse is easy.

Disadvantages of Top-Down Approach:

  1. The cost, time taken in designing and its maintenance is very high.

Bottom-up approach:

  1. First, the data is extracted from external sources (same as happens in top-down approach).
  2. Then, the data go through the staging area (as explained above) and loaded into data marts instead of data warehouse. The data marts are created first and provide reporting capability. It addresses a single business area.
  3. These data marts are then integrated into data warehouse.

This approach is given by Kinball as data marts are created first and provides a thin view for analyses and data warehouse is created after complete data marts have been created.

Advantages of Bottom-Up Approach:

  1. As the data marts are created first, so the reports are quickly generated.
  2. We can accommodate more number of data marts here and in this way Datawarehouse can be extended.
  3. Also, the cost and time taken in designing this model is low comparatively.

Disadvantage of Bottom-Up Approach:

  1. This model is not strong as top-down approach as dimensional view of data marts is not consistent as it is in above approach.

Characteristics of Data warehouse

Subject-Oriented

A data warehouse is subject oriented as it offers information regarding a theme instead of companies’ ongoing operations. These subjects can be sales, marketing, distributions, etc.

A data warehouse never focuses on the ongoing operations. Instead, it put emphasis on modeling and analysis of data for decision making. It also provides a simple and concise view around the specific subject by excluding data which not helpful to support the decision process.

Integrated

In Data Warehouse, integration means the establishment of a common unit of measure for all similar data from the dissimilar database. The data also needs to be stored in the Datawarehouse in common and universally acceptable manner.

A data warehouse is developed by integrating data from varied sources like a mainframe, relational databases, flat files, etc. Moreover, it must keep consistent naming conventions, format, and coding.

This integration helps in effective analysis of data. Consistency in naming conventions, attribute measures, encoding structure etc. have to be ensured.

Time-variant

The time horizon for data warehouse is quite extensive compared with operational systems. The data collected in a data warehouse is recognized with a particular period and offers information from the historical point of view. It contains an element of time, explicitly or implicitly.

One such place where Datawarehouse data display time variance is in in the structure of the record key. Every primary key contained with the DW should have either implicitly or explicitly an element of time. Like the day, week month, etc.

Another aspect of time variance is that once data is inserted in the warehouse, it can’t be updated or changed.

Non-volatile

Data warehouse is also non-volatile means the previous data is not erased when new data is entered in it.

Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what & when happened. It does not require transaction process, recovery and concurrency control mechanisms.

Activities like delete, update, and insert which are performed in an operational application environment are omitted in Data warehouse environment. Only two types of data operations performed in the Data Warehousing are

  • Data loading
  • Data access