CTE in SQL
In a formal sense, a Common Table Expression (CTE), is a temporary result set that can be used in a SQL query. You can use CTEs to break up complex queries into simpler blocks of code that can connect and build on each other. In a less formal, more human-sense, you can think of a CTE as a separate, smaller query within the larger query you’re building up. Creating a CTE is essentially like making a temporary view that you can access throughout the rest of the query you are writing.
There are two-types of CTEs: recursive and non-recursive. This glossary focuses on non-recursive CTEs.
Why you should care about CTEs
Have you ever read through a query and thought:
- “What does this part of the query do?”
- “What are all the sources referenced in this query? Why did I reference this dependency?”
- “My query is not producing the results I expect and I’m not sure which part of the query is causing that.”
These thoughts often arise when we’ve written SQL queries and models that utilize complex business logic, references and joins multiple upstream dependencies, and are not outputting expected results. In a nutshell, these thoughts can occur often when you’re trying to write data models!
How can you make these complexities in your code more digestible and usable? CTEs to the rescue!
CTE Syntax: How it works
To use CTEs, you begin by defining your first CTE using the WITH
statement followed by a SELECT
statement.
Let’s break down this example involving a rename_columns
CTE below:
with rename_columns as (
select
id as customer_id,
lower(first_name) as customer_first_name,
lower(last_name) as customer_last_initial
from {{ ref('raw_customers') }}
)
select * from rename_columns
In this query above, you first create a CTE called rename_columns
where you conduct a
simple SELECT
statement that renames and lower cases some columns from a raw_customers
table/model. The final select * from rename_columns
selects all results from the rename_columns
CTE.
While you shouldn't always think of CTEs as having classical arguments like SQL functions, you’ve got to call the necessary inputs for CTEs something.
- CTE_EXPRESSION_NAME: This is the name of the CTE you can reference in other CTEs or SELECT statements. In our example,
rename_columns
is the CTE_EXPRESSION_NAME. If you are using multiple CTEs in one query, it’s important to note that each CTE_EXPRESSION_NAME must be unique. - CTE_QUERY: This is the
SELECT
statement whose result set is produced by the CTE. In our example above, theselect … from {{ ref('raw_customers') }}
is the CTE_QUERY. The CTE_QUERY is framed by parenthesis.