SQL Data Warehouse

Data Architecture

Too many SQL projects just stop at SELECT * FROM cleaned_data and call it a day. I wanted to build a real SQL data warehouse — one that captures the actual lifecycle of data: from raw ingestion, through transformation, into a dimensional model ready for analytics.

SQL Data Warehouse

This project simulates the development of a modern, modular, and production-style data warehouse using SQL Server, structured around the Medallion Architecture (Bronze → Silver → Gold). It’s designed for engineers who want to understand how data moves, evolves, and ultimately drives decision-making.

Architecture Overview

The project follows the Medallion Architecture, breaking down the data journey into three logical layers:

Data Architecture
Data Architecture

Bronze Layer – Raw Data

Raw CSV exports from ERP and CRM systems (simulated). These are ingested as-is into SQL Server, with no transformation or filtering. This layer acts as the immutable source of truth.

Bronze Layer Data Flow
Bronze Layer Data Flow

Silver Layer – Cleaned & Standardized

The real work begins here: data cleaning, deduplication, type harmonization, and joining begin. The goal: produce reliable and consistent data ready for modeling.

Silver Layer Data Flow
Silver Layer Data Flow

Gold Layer – Business-Focused Star Schema

Here’s where we implement dimensional modeling. Fact and dimension tables are built for analytical use cases like customer segmentation, sales trend analysis, and product performance.

Gold Layer Data Flow
Gold Layer Data Flow

Workflow

This isn’t just a folder with a few scripts — it’s a structured pipeline. Here’s how data moves through the system:

1. Ingestion (Bronze)

CSV files (datasets/) are loaded into SQL Server using simple bulk import techniques. No cleaning, no changes — just persistence.

📂 Example: bronze/customers_bronze.sql

2. Transformation (Silver)

We apply SQL logic to:

  • Remove nulls
  • Resolve duplicates
  • Normalize inconsistent formats
  • Join ERP and CRM datasets where needed

📂 Example: silver/customers_silver.sql

Each table in the Silver layer has a direct lineage back to a Bronze source — and can be rebuilt deterministically.

3. Modeling & Analytics (Gold)

This is where we implement a Star Schema:

  • Fact Tables: fact_sales
  • Dimension Tables: dim_customer, dim_product, dim_date, dim_region

These are built for fast aggregation, flexible slicing, and compatibility with BI tools. It’s clean, powerful, and makes analysts happy.

📂 Example: gold/fact_sales.sql

Analytics: Insights, Not Just Tables

Once the Gold layer is in place, it’s time to extract value. The project includes a set of business-ready queries to answer real questions:

  • What are the top-selling products by region?
  • Which customers are most engaged?
  • How do seasonal trends impact revenue?

These aren’t toy queries — they’re structured to support dashboarding, reporting, and executive decisions.

📂 Bonus: Queries designed to integrate with Tableau or Power BI.

Data Quality Testing

You can’t have good data without good tests.

This project includes automated SQL tests that validate:

  • Null constraints
  • Primary/foreign key relationships
  • Expected value ranges (e.g. revenue > 0)
  • Duplicate prevention

📂 tests/quality_checks_silver.sql
📂 tests/quality_checks_gold.sql

Think of it like dbt tests — but hand-coded, so you understand how they work.

Tools Used

All tools used are free and accessible:

ToolPurpose
SQL ServerCore data warehouse engine
SSMSDevelopment IDE
Draw.ioArchitecture & ER diagrams
NotionProject documentation
CSV DatasetsSimulated ERP & CRM sources

📁 All datasets are available in the datasets/ folder in the repo.

Project Structure

📦
├─ README.md
├─ assets
│  ├─ Bronze Layer Data Flow.png
│  ├─ Data Architecture.png
│  ├─ Gold Layer Data Flow.png
│  └─ Silver Layer Data Flow.png
├─ datasets
│  ├─ source_crm
│  │  ├─ cust_info.csv
│  │  ├─ prd_info.csv
│  │  └─ sales_details.csv
│  └─ source_erp
│     ├─ CUST_AZ12.csv
│     ├─ LOC_A101.csv
│     └─ PX_CAT_G1V2.csv
├─ docs
│  └─ data_catalog.md
├─ scripts
│  ├─ bronze
│  │  ├─ ddl_bronze.sql
│  │  └─ proc_load_bronze.sql
│  ├─ gold
│  │  └─ ddl_gold.sql
│  ├─ init_database.sql
│  └─ silver
│     ├─ ddl_silver.sql
│     └─ proc_load_silver.sql
└─ tests
├─ quality_checks_gold.sql
└─ quality_checks_silver.sql

Leave a Reply

Your email address will not be published. Required fields are marked *