homeexamplesdbt-practical-guide

DBT: Modern Data Transformation for Analysts

Published Jan 9, 2025
2 minutes read

Data Build Tool (DBT) has revolutionized how we handle data transformations in modern data stacks. Having worked with various ETL tools over the years, I've seen firsthand how DBT has simplified complex data transformations while bringing software engineering best practices to data analytics.

Why DBT is Gaining Traction

The rise of DBT isn't just another tech trend - it addresses real pain points in data workflows. Traditional ETL tools often require specialized knowledge and create a bottleneck where data engineers become gatekeepers of the transformation process. DBT changes this by enabling analysts to build reliable data transformations using familiar SQL skills.

Modern Data Stack with DBT
DBT's position in the modern data stack, transforming raw data into analytics-ready models

Core Concepts

Models and Materializations

Models in DBT are essentially SQL SELECT statements that transform your data. Here's a simple example:

-- models/marts/core/dim_customers.sql
WITH customers AS (
    SELECT * FROM {{ ref('stg_customers') }}
),
 
orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
),
 
customer_orders AS (
    SELECT
        customer_id,
        count(*) as number_of_orders,
        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date
    FROM orders
    GROUP BY 1
)
 
SELECT
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customer_orders.number_of_orders,
    customer_orders.first_order_date,
    customer_orders.most_recent_order_date
FROM customers
LEFT JOIN customer_orders USING (customer_id)

You can control how DBT materializes these models:

# dbt_project.yml
models:
  my_project:
    marts:
      core:
        +materialized: table
    staging:
      +materialized: view
DBT Dependency Graph
Visualization of model dependencies in DBT's DAG

Testing and Documentation

One of DBT's strongest features is its built-in testing framework. Here's how you can add tests to your models:

# models/schema.yml
version: 2
 
models:
  - name: dim_customers
    description: "Cleaned customer dimension table"
    columns:
      - name: customer_id
        description: "Primary key"
        tests:
          - unique
          - not_null
      - name: email
        description: "Customer email address"
        tests:
          - not_null
          - unique

Advanced Features: Incremental Models

For large datasets, incremental models are crucial for performance:

-- models/marts/core/fct_daily_sales.sql
{{ config(
    materialized='incremental',
    unique_key='order_date'
) }}
 
SELECT
    order_date,
    sum(amount) as daily_total
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
    WHERE order_date > (SELECT max(order_date) FROM {{ this }})
{% endif %}
GROUP BY 1

Real-World Example: Marketing Analytics

Let's look at a practical example of using DBT for marketing analytics:

-- models/marts/marketing/fct_campaign_performance.sql
WITH campaign_costs AS (
    SELECT * FROM {{ ref('stg_campaign_costs') }}
),
 
campaign_conversions AS (
    SELECT * FROM {{ ref('stg_conversions') }}
),
 
daily_metrics AS (
    SELECT
        campaign_costs.campaign_id,
        campaign_costs.date,
        campaign_costs.spend,
        count(distinct campaign_conversions.conversion_id) as conversions,
        sum(campaign_conversions.revenue) as revenue
    FROM campaign_costs
    LEFT JOIN campaign_conversions 
        ON campaign_costs.campaign_id = campaign_conversions.campaign_id
        AND campaign_costs.date = campaign_conversions.conversion_date
    GROUP BY 1, 2, 3
)
 
SELECT
    *,
    revenue - spend as profit,
    CASE 
        WHEN conversions > 0 THEN spend / conversions 
        ELSE NULL 
    END as cost_per_conversion
FROM daily_metrics
Marketing Analytics Models
Example of marketing analytics models and their dependencies

Best Practices

From my experience working with DBT, here are some key practices:

  1. Model Organization
models/
├── marts/
│   ├── core/
│   ├── marketing/
│   └── finance/
├── staging/
│   ├── salesforce/
│   └── google_analytics/
└── intermediate/
  1. Testing Strategy
# models/staging/schema.yml
version: 2
 
models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'returned']
  1. Documentation
# models/marts/marketing/schema.yml
version: 2
 
models:
  - name: fct_campaign_performance
    description: "Daily campaign performance metrics including costs and conversions"
    columns:
      - name: campaign_id
        description: "Unique identifier for the campaign"
      - name: date
        description: "Date of the metrics"
      - name: spend
        description: "Total spend for the campaign on this date"
        tests:
          - not_null
          - positive_values

Looking Ahead

DBT continues to evolve, with features like the semantic layer and metrics definitions becoming increasingly important. The key is starting simple - begin with basic models and gradually incorporate more advanced features as your needs grow.

Remember, DBT isn't just about transforming data; it's about building a reliable, testable, and maintainable data transformation workflow that grows with your organization.