SQL CTEs vs Temp Tables: A Practical Guide
Working with complex SQL queries often means breaking down the problem into smaller, manageable chunks. SQL Server offers two powerful tools for this: Common Table Expressions (CTEs) and temporary tables. Having used both extensively in marketing data analysis, I want to share when and how to use each effectively.
Note: The examples in this post use SQL Server (T-SQL) syntax, though the concepts apply to other SQL flavors with minor modifications.
Understanding the Basics
Let's start with simple definitions:
CTEs are like named subqueries that exist only for the duration of a single query:
WITH CustomerSegments AS (
SELECT
customer_id,
CASE
WHEN total_spend > 1000 THEN 'High Value'
WHEN total_spend > 500 THEN 'Medium Value'
ELSE 'Low Value'
END AS segment
FROM customer_transactions
)
SELECT segment, COUNT(*) as customer_count
FROM CustomerSegments
GROUP BY segment;
Temp tables are physical tables that exist in the tempdb and persist until dropped or the session ends:
CREATE TABLE #CustomerSegments (
customer_id INT,
segment VARCHAR(20)
);
INSERT INTO #CustomerSegments
SELECT
customer_id,
CASE
WHEN total_spend > 1000 THEN 'High Value'
WHEN total_spend > 500 THEN 'Medium Value'
ELSE 'Low Value'
END AS segment
FROM customer_transactions;
Real-World Example: Marketing Campaign Analysis
Let's look at a practical example using marketing campaign data. We'll analyze customer engagement across multiple campaigns:
-- Using CTEs for campaign analysis
WITH CampaignEngagement AS (
SELECT
c.campaign_id,
c.campaign_name,
COUNT(DISTINCT e.customer_id) as unique_customers,
SUM(e.interaction_count) as total_interactions
FROM marketing_campaigns c
LEFT JOIN customer_interactions e ON c.campaign_id = e.campaign_id
GROUP BY c.campaign_id, c.campaign_name
),
CustomerValue AS (
SELECT
customer_id,
SUM(purchase_amount) as total_spend,
COUNT(order_id) as order_count
FROM orders
WHERE order_date >= DATEADD(month, -6, GETDATE())
GROUP BY customer_id
)
SELECT
ce.campaign_name,
ce.unique_customers,
ce.total_interactions,
AVG(cv.total_spend) as avg_customer_value
FROM CampaignEngagement ce
JOIN customer_interactions ci ON ce.campaign_id = ci.campaign_id
JOIN CustomerValue cv ON ci.customer_id = cv.customer_id
GROUP BY ce.campaign_name, ce.unique_customers, ce.total_interactions
ORDER BY avg_customer_value DESC;
The same analysis using temp tables:
-- Create temp table for campaign engagement
CREATE TABLE #CampaignEngagement (
campaign_id INT,
campaign_name VARCHAR(100),
unique_customers INT,
total_interactions INT
);
INSERT INTO #CampaignEngagement
SELECT
c.campaign_id,
c.campaign_name,
COUNT(DISTINCT e.customer_id),
SUM(e.interaction_count)
FROM marketing_campaigns c
LEFT JOIN customer_interactions e ON c.campaign_id = e.campaign_id
GROUP BY c.campaign_id, c.campaign_name;
-- Create temp table for customer value
CREATE TABLE #CustomerValue (
customer_id INT,
total_spend DECIMAL(10,2),
order_count INT
);
INSERT INTO #CustomerValue
SELECT
customer_id,
SUM(purchase_amount),
COUNT(order_id)
FROM orders
WHERE order_date >= DATEADD(month, -6, GETDATE())
GROUP BY customer_id;
-- Final analysis
SELECT
ce.campaign_name,
ce.unique_customers,
ce.total_interactions,
AVG(cv.total_spend) as avg_customer_value
FROM #CampaignEngagement ce
JOIN customer_interactions ci ON ce.campaign_id = ci.campaign_id
JOIN #CustomerValue cv ON ci.customer_id = cv.customer_id
GROUP BY ce.campaign_name, ce.unique_customers, ce.total_interactions
ORDER BY avg_customer_value DESC;
-- Clean up
DROP TABLE #CampaignEngagement;
DROP TABLE #CustomerValue;
When to Use Each
After working with both approaches, here's when I prefer each:
CTEs are better when:
- You need improved query readability
- The intermediate results are only used once
- You're working with recursive queries (like organizational hierarchies)
- The dataset is relatively small
Temp tables are better when:
- You need to reference the intermediate results multiple times
- You're working with large datasets
- You need to create indexes on intermediate results
- You want to persist results across multiple queries
Advanced CTE Example: Recursive Customer Referrals
Here's a practical example of using recursive CTEs to track customer referral chains:
WITH ReferralChain AS (
-- Base case: Initial referrers
SELECT
referred_customer_id,
referring_customer_id,
1 as referral_level
FROM customer_referrals
WHERE referring_customer_id IN (
SELECT customer_id
FROM customers
WHERE customer_type = 'Ambassador'
)
UNION ALL
-- Recursive case: Find subsequent referrals
SELECT
cr.referred_customer_id,
rc.referring_customer_id,
rc.referral_level + 1
FROM customer_referrals cr
INNER JOIN ReferralChain rc
ON cr.referring_customer_id = rc.referred_customer_id
WHERE rc.referral_level < 5 -- Limit to 5 levels deep
)
SELECT
referring_customer_id as ambassador_id,
referral_level,
COUNT(DISTINCT referred_customer_id) as referrals_at_level
FROM ReferralChain
GROUP BY referring_customer_id, referral_level
ORDER BY referring_customer_id, referral_level;
Best Practices
From my experience, here are some key tips:
- Always use meaningful names for CTEs and temp tables
- Clean up temp tables when you're done with them
- Consider using table variables for very small datasets
- Add comments to complex CTEs explaining the transformation
- Be careful with recursive CTEs - always include a stopping condition
Conclusion
Both CTEs and temp tables are valuable tools in SQL Server. CTEs excel at improving query readability and handling recursive scenarios, while temp tables shine when working with large datasets or when you need to reuse intermediate results. The key is understanding their strengths and choosing the right tool for your specific use case.