This article was automatically translated from the original Turkish version.
Data data warehouse is a information management system that collects large volumes of data from different enterprise systems into a single centralized repository for analysis. These systems provide critical information especially for decision support systems and business intelligence applications.
Several key features distinguish data warehouses from traditional databases:
The primary purpose of data warehouses is to integrate data from disparate systems to enable organizations to make more accurate and informed decisions. Key use cases include:
Data warehouses have become a important vehicle for many organizations, from large enterprises to small businesses. Today, with cloud-based solutions, data warehouses have become more accessible and scalable.
1. Data Source Layer: Data from various departments of the organization is collected at this layer. This data originates from operational databases (Oracle, MySQL), CRM systems (Salesforce, HubSpot), websites (Google Analytics), and sensors such as among other sources. This layer represents the initial point of data ingestion from diverse sources.
2. Data Ingestion Layer: At this layer, collected data is cleaned, transformed, and loaded into the data warehouse. These processes are typically performed using ETL tools (Informatica, Talend). This layer ensures data is prepared for analysis.
3. Data Storage Layer: Data loaded into the warehouse is stored at this layer. This layer ensures secure and organized data retention.
4. Data Access Layer: This layer enables users to access data in the warehouse. Access is typically provided through SQL queries, reporting tools (Tableau, Power BI), and analytic tools (R, Python). This layer ensures easy and fast data access.
5. Metadata Layer: This layer stores information about the data warehouse itself, including the meaning, origin, and usage of data. Metadata management tools (IBM InfoSphere, Informatica Metadata Manager) are used. This layer is essential for understanding and tracing data meaning and source.
6. Data Processing Layer: At this layer, collected data is analyzed and transformed. Machine learning and artificial intelligence models (TensorFlow, PyTorch) are commonly used here. This layer is necessary for extracting insights from data.
7. Presentation Layer: In this final layer, data from the warehouse is presented visually to users. Tools such as charts, graphs, and indicator dashboards (Tableau, Power BI) are used. This layer is where data reaches end users and provides visual tools to interpret it.
A data warehouse is a heterogeneous collection of data sources organized under a unified schema. Two main approaches are used in building data warehouses: the Inmon Approach (Top-Down Approach) and the Kimball Approach (Bottom-Up Approach).
This approach is based on an enterprise data model and typically involves constructing a single, comprehensive data warehouse. First, a unified data warehouse is built that encompasses all data elements and reflects the overall operations of the organization. This warehouse is designed to meet all data needs of the enterprise. Subsequently, specialized data March are created for specific functional areas or business processes. These data marts use data extracted from the central warehouse to serve the specific analytical needs of their respective domains.
This original and efficient approach involves creating specialized data marts that serve specific business areas. Each data mart organizes and prepares data for analysis focused on a particular domain. Later, these data marts are integrated to form a comprehensive enterprise data warehouse. This method typically delivers faster and more effective results while increasing overall system flexibility. The Kimball approach is a practical method for organizations seeking to quickly value and gain rapid access to required information.
Data warehouses are used across various industries as follows:
Traditional data warehouse architectures are optimized for structured data. However, with the emergence of Big Data, there has been a growing need to analyze unstructured and semi-structured data. Data warehouses have been integrated with Big Data technologies such as Hadoop and Spark to expand their data processing capabilities.
Data warehouse technology is generally categorized into three main types, each fulfilling a specific function and complementing the others. Taking an e-trade company as an example, the types of data warehouses can be defined as follows:
Operational Data Store (ODS): The purpose of an ODS is to accelerate daily operations. The ODS captures and stores data from daily transactions of an e-commerce platform. This data includes critical information such as orders, customers, products, and payments. With an ODS, we can quickly perform the following queries:
Enterprise Data Warehouse (EDW): The EDW stores all data of the company and enables analysis of the company’s past performance and forecasting of future trends. Compared to the ODS, the EDW can store more data and perform more complex analyses. This helps the company evaluate its operations with a broader perspective and make strategic decisions. The EDW supports complex queries and analyses such as:
Data Mart: A data mart extracts and optimizes subsets of data from the EDW focused on a specific subject. For example, a marketing data mart may contain data on customers, purchase histories, and demographic information. This enables the marketing department to:
Key Features of a Data Warehouse
Purpose of Data Warehouses
Data Warehouse Components
Data Warehouse Architectural Approaches
Inmon Approach (Top-Down Approach)
Kimball Approach (Bottom-Up Approach)
Data Warehouse Use Cases
Relationship Between Data Warehouse and Big Data
Differences Between Data Warehouse and Big Data
Data Warehouse Types