Skip to main content

Quickstart for dbt Core from a manual install

dbt Core
Quickstart
Beginner
Menu

    Introduction

    When you use dbt Core to work with dbt, you will be editing files locally using a code editor, and running projects using a command line interface (CLI).

    If you want to edit files and run projects using the web-based dbt Integrated Development Environment (IDE), refer to the dbt Cloud quickstarts. You can also develop and run dbt commands using the dbt Cloud CLI — a dbt Cloud powered command line.

    Prerequisites

    • To use dbt Core, it's important that you know some basics of the Terminal. In particular, you should understand cd, ls and pwd to navigate through the directory structure of your computer easily.
    • Install dbt Core using the installation instructions for your operating system.
    • Complete appropriate Setting up and Loading data steps in the Quickstart for dbt Cloud series. For example, for BigQuery, complete Setting up (in BigQuery) and Loading data (BigQuery).
    • Create a GitHub account if you don't already have one.

    Create a starter project

    After setting up BigQuery to work with dbt, you are ready to create a starter project with example models, before building your own models.

    Create a repository

    The following steps use GitHub as the Git provider for this guide, but you can use any Git provider. You should have already created a GitHub account.

    1. Create a new GitHub repository named dbt-tutorial.
    2. Select Public so the repository can be shared with others. You can always make it private later.
    3. Leave the default values for all other settings.
    4. Click Create repository.
    5. Save the commands from "…or create a new repository on the command line" to use later in Commit your changes.

    Create a project

    Learn how to use a series of commands using the command line of the Terminal to create your project. dbt Core includes an init command that helps scaffold a dbt project.

    To create your dbt project:

    1. Make sure you have dbt Core installed and check the version using the dbt --version command:
    dbt --version
    1. Initiate the jaffle_shop project using the init command:
    dbt init jaffle_shop
    1. Navigate into your project's directory:
    cd jaffle_shop
    1. Use pwd to confirm that you are in the right spot:
    $ pwd
    > Users/BBaggins/dbt-tutorial/jaffle_shop
    1. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and .sql and .yml files generated by the init command.
    The starter project in a code editorThe starter project in a code editor
    1. dbt provides the following values in the dbt_project.yml file:
    dbt_project.yml
    name: jaffle_shop # Change from the default, `my_new_project`

    ...

    profile: jaffle_shop # Change from the default profile name, `default`

    ...

    models:
    jaffle_shop: # Change from `my_new_project` to match the previous value for `name:`
    ...

    Connect to BigQuery

    When developing locally, dbt connects to your data warehouse using a profile, which is a YAML file with all the connection details to your warehouse.

    1. Create a file in the ~/.dbt/ directory named profiles.yml.
    2. Move your BigQuery keyfile into this directory.
    3. Copy the following and paste into the new profiles.yml file. Make sure you update the values where noted.
    profiles.yml
    jaffle_shop: # this needs to match the profile in your dbt_project.yml file
    target: dev
    outputs:
    dev:
    type: bigquery
    method: service-account
    keyfile: /Users/BBaggins/.dbt/dbt-tutorial-project-331118.json # replace this with the full path to your keyfile
    project: grand-highway-265418 # Replace this with your project id
    dataset: dbt_bbagins # Replace this with dbt_your_name, e.g. dbt_bilbo
    threads: 1
    timeout_seconds: 300
    location: US
    priority: interactive
    1. Run the debug command from your project to confirm that you can successfully connect:
    $ dbt debug
    > Connection test: OK connection ok
    A successful dbt debug commandA successful dbt debug command

    FAQs

    My data team uses a different data warehouse. What should my profiles.yml file look like for my warehouse?
    Why are profiles stored outside of my project?
    What should I name my profile?
    What should I name my target?
    Can I use environment variables in my profile?

    Perform your first dbt run

    Our sample project has some example models in it. We're going to check that we can run them to confirm everything is in order.

    1. Enter the run command to build example models:
    dbt run

    You should have an output that looks like this:

    A successful dbt run commandA successful dbt run command

    Commit your changes

    Commit your changes so that the repository contains the latest code.

    1. Link the GitHub repository you created to your dbt project by running the following commands in Terminal. Make sure you use the correct git URL for your repository, which you should have saved from step 5 in Create a repository.
    git init
    git branch -M main
    git add .
    git commit -m "Create a dbt project"
    git remote add origin https://github.com/USERNAME/dbt-tutorial.git
    git push -u origin main
    1. Return to your GitHub repository to verify your new files have been added.

    Build your first models

    Now that you set up your sample project, you can get to the fun part — building models! In the next steps, you will take a sample query and turn it into a model in your dbt project.

    Checkout a new git branch

    Check out a new git branch to work on new code:

    1. Create a new branch by using the checkout command and passing the -b flag:
    $ git checkout -b add-customers-model
    > Switched to a new branch `add-customer-model`

    Build your first model

    1. Open your project in your favorite code editor.
    2. Create a new SQL file in the models directory, named models/customers.sql.
    3. Paste the following query into the models/customers.sql file.
    with customers as (

    select
    id as customer_id,
    first_name,
    last_name

    from `dbt-tutorial`.jaffle_shop.customers

    ),

    orders as (

    select
    id as order_id,
    user_id as customer_id,
    order_date,
    status

    from `dbt-tutorial`.jaffle_shop.orders

    ),

    customer_orders as (

    select
    customer_id,

    min(order_date) as first_order_date,
    max(order_date) as most_recent_order_date,
    count(order_id) as number_of_orders

    from orders

    group by 1

    ),

    final as (

    select
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customer_orders.first_order_date,
    customer_orders.most_recent_order_date,
    coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

    )

    select * from final
    1. From the command line, enter dbt run.
    A successful run with the dbt Core CLIA successful run with the dbt Core CLI

    When you return to the BigQuery console, you can select from this model.

    FAQs

    How can I see the SQL that dbt is running?
    How did dbt choose which schema to build my models in?
    Do I need to create my target schema before running dbt?
    If I rerun dbt, will there be any downtime as models are rebuilt?
    What happens if the SQL in my query is bad or I get a database error?

    Change the way your model is materialized

    One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes.

    By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization.

    1. Edit your dbt_project.yml file.

      • Update your project name to:

        dbt_project.yml
        name: 'jaffle_shop'
      • Configure jaffle_shop so everything in it will be materialized as a table; and configure example so everything in it will be materialized as a view. Update your models config block to:

        dbt_project.yml
        models:
        jaffle_shop:
        +materialized: table
        example:
        +materialized: view
      • Click Save.

    2. Enter the dbt run command. Your customers model should now be built as a table!

      info

      To do this, dbt had to first run a drop view statement (or API call on BigQuery), then a create table as statement.

    3. Edit models/customers.sql to override the dbt_project.yml for the customers model only by adding the following snippet to the top, and click Save:

      models/customers.sql
      {{
      config(
      materialized='view'
      )
      }}

      with customers as (

      select
      id as customer_id
      ...

      )

    4. Enter the dbt run command. Your model, customers, should now build as a view.

      • BigQuery users need to run dbt run --full-refresh instead of dbt run to full apply materialization changes.
    5. Enter the dbt run --full-refresh command for this to take effect in your warehouse.

    FAQs

    What materializations are available in dbt?
    Which materialization should I use for my model?
    What model configurations exist?

    Delete the example models

    You can now delete the files that dbt created when you initialized the project:

    1. Delete the models/example/ directory.

    2. Delete the example: key from your dbt_project.yml file, and any configurations that are listed under it.

      dbt_project.yml
      # before
      models:
      jaffle_shop:
      +materialized: table
      example:
      +materialized: view
      dbt_project.yml
      # after
      models:
      jaffle_shop:
      +materialized: table
    3. Save your changes.

    FAQs

    How do I remove deleted models from my data warehouse?
    I got an "unused model configurations" error message, what does this mean?

    Build models on top of other models

    As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs).

    Now you can experiment by separating the logic out into separate models and using the ref function to build models on top of other models:

    The DAG we want for our dbt projectThe DAG we want for our dbt project
    1. Create a new SQL file, models/stg_customers.sql, with the SQL from the customers CTE in our original query.
    2. Create a second new SQL file, models/stg_orders.sql, with the SQL from the orders CTE in our original query.
    models/stg_customers.sql
    select
    id as customer_id,
    first_name,
    last_name

    from `dbt-tutorial`.jaffle_shop.customers
    models/stg_orders.sql
    select
    id as order_id,
    user_id as customer_id,
    order_date,
    status

    from `dbt-tutorial`.jaffle_shop.orders
    1. Edit the SQL in your models/customers.sql file as follows:
    models/customers.sql
    with customers as (

    select * from {{ ref('stg_customers') }}

    ),

    orders as (

    select * from {{ ref('stg_orders') }}

    ),

    customer_orders as (

    select
    customer_id,

    min(order_date) as first_order_date,
    max(order_date) as most_recent_order_date,
    count(order_id) as number_of_orders

    from orders

    group by 1

    ),

    final as (

    select
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customer_orders.first_order_date,
    customer_orders.most_recent_order_date,
    coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

    )

    select * from final

    1. Execute dbt run.

    This time, when you performed a dbt run, separate views/tables were created for stg_customers, stg_orders and customers. dbt inferred the order to run these models. Because customers depends on stg_customers and stg_orders, dbt builds customers last. You do not need to explicitly define these dependencies.

    FAQs

    How do I run one model at a time?
    Do ref-able resource names need to be unique?
    As I create more models, how should I keep my project organized? What should I name my models?

    Next steps

    Before moving on from building your first models, make a change and see how it affects your results:

    • Write some bad SQL to cause an error — can you debug the error?
    • Run only a single model at a time. For more information, see Syntax overview.
    • Group your models with a stg_ prefix into a staging subdirectory. For example, models/staging/stg_customers.sql.
      • Configure your staging models to be views.
      • Run only the staging models.

    You can also explore:

    • The target directory to see all of the compiled SQL. The run directory shows the create or replace table statements that are running, which are the select statements wrapped in the correct DDL.
    • The logs file to see how dbt Core logs all of the action happening within your project. It shows the select statements that are running and the python logging happening when dbt runs.

    Add tests to your models

    Adding tests to a project helps validate that your models are working correctly.

    To add tests to your project:

    1. Create a new YAML file in the models directory, named models/schema.yml

    2. Add the following contents to the file:

      models/schema.yml
      version: 2

      models:
      - name: customers
      columns:
      - name: customer_id
      tests:
      - unique
      - not_null

      - name: stg_customers
      columns:
      - name: customer_id
      tests:
      - unique
      - not_null

      - name: stg_orders
      columns:
      - name: order_id
      tests:
      - unique
      - not_null
      - name: status
      tests:
      - accepted_values:
      values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
      - name: customer_id
      tests:
      - not_null
      - relationships:
      to: ref('stg_customers')
      field: customer_id

    3. Run dbt test, and confirm that all your tests passed.

    When you run dbt test, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful.

    FAQs

    What tests are available for me to use in dbt? Can I add my own custom tests?
    How do I test one model at a time?
    One of my tests failed, how can I debug it?
    Does my test file need to be named `schema.yml`?
    Why do model and source yml files always start with `version: 2`?
    What tests should I add to my project?
    When should I run my tests?

    Document your models

    Adding documentation to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project.

    1. Update your models/schema.yml file to include some descriptions, such as those below.

      models/schema.yml
      version: 2

      models:
      - name: customers
      description: One record per customer
      columns:
      - name: customer_id
      description: Primary key
      tests:
      - unique
      - not_null
      - name: first_order_date
      description: NULL when a customer has not yet placed an order.

      - name: stg_customers
      description: This model cleans up customer data
      columns:
      - name: customer_id
      description: Primary key
      tests:
      - unique
      - not_null

      - name: stg_orders
      description: This model cleans up order data
      columns:
      - name: order_id
      description: Primary key
      tests:
      - unique
      - not_null
      - name: status
      tests:
      - accepted_values:
      values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
      - name: customer_id
      tests:
      - not_null
      - relationships:
      to: ref('stg_customers')
      field: customer_id
    2. Run dbt docs generate to generate the documentation for your project. dbt introspects your project and your warehouse to generate a JSON file with rich documentation about your project.

    1. Run dbt docs serve command to launch the documentation in a local website.

    FAQs

    How do I write long-form explanations in my descriptions?
    How do I access documentation in dbt Explorer?

    Next steps

    Before moving on from testing, make a change and see how it affects your results:

    • Write a test that fails, for example, omit one of the order statuses in the accepted_values list. What does a failing test look like? Can you debug the failure?
    • Run the tests for one model only. If you grouped your stg_ models into a directory, try running the tests for all the models in that directory.
    • Use a docs block to add a Markdown description to a model.

    Commit updated changes

    You need to commit the changes you made to the project so that the repository has your latest code.

    1. Add all your changes to git: git add -A
    2. Commit your changes: git commit -m "Add customers model, tests, docs"
    3. Push your changes to your repository: git push
    4. Navigate to your repository, and open a pull request to merge the code into your master branch.

    Schedule a job

    We recommend using dbt Cloud as the easiest and most reliable way to deploy jobs and automate your dbt project in production.

    For more info on how to get started, refer to create and schedule jobs.

    Overview of a dbt Cloud job run, which includes the job run details, trigger type, commit SHA, environment name, detailed run steps, logs, and more.Overview of a dbt Cloud job run, which includes the job run details, trigger type, commit SHA, environment name, detailed run steps, logs, and more.

    For more information about using dbt Core to schedule a job, refer dbt airflow blog post.

    0