In this 3 part series, we will go through the dbt™ commands and how analytics engineers can accelerate their data transformations.
In this 3 part series, we will go through the anatomy of a dbt™ command and how analytics engineers can use them to power their data transformation pipelines. Every dbt™ command has its own options and parameters and complex syntax that one can apply.
In the first article, we will cover the the basics, followed by graph operators in the second and then in the last article we will look at selector methods. So, lets get started 🤘.
dbt run
The bread and butter of dbt™ is the run
command. It's like hitting the "Go" button on your data transformations. The dbt run
command is the most complex and can be broken down into 4 parts as follows:
--select
, --exclude
and othersIn this article we will consider only the most important options analytics engineers need know. In the following articles of this series we will go into the details of method and graph selectors.
dbt run --select <method-and-graph-operators>
But wait, there's more! Add more power with these options:
--select:
Run specific models
dbt run --select cool_waffle
--exclude:
Skip certain models
dbt run --exclude boring_jaffle
--full-refresh:
Rebuild everything from scratch (you can blow up your CFOs data budget if you do this without fully understanding the consequences 😛)
dbt run --full-refresh
--vars:
Pass variables in the models
dbt run --vars '{"my_var": "value"}'
--threads:
Speed up the runs with multiple threads
dbt run --threads 4
Don't let bad data crash your party.
Use dbt test
to keep your transformations in check and apply data quality best practices to your dbt™ transformation pipelines:
dbt test
Get selective with:
--select:
Test specific models
dbt test --select critical_data
Run schema tests only
dbt test --select "test_type:generic"
Source freshness in dbt™ is like a built-in data freshness checker. It helps you:
To check the freshness of all you defined sources, run:
dbt source freshness
Use dbt compile
to convert all your dbt™ models with their Jinja references into raw SQL. This is the SQL dbt™ will run against your data warehouse. It's like X-ray vision for your SQL:
dbt compile
When your dbt™ models fail to run, you need to start with the compiled SQL first.
Convert all your schema and table description into static HTML files and then serve them from a server or cloud bucket like AWS S3.
dbt docs generate
dbt docs serve
When you can’t make head or tail of errors your are seeing during development or production runs, use the --debug
option. This will generate additional logs in your terminal to help triage the situation. This is most useful in diagnosing warehouse connection errors.
dbt run --debug
Capture data changes over time:
dbt snapshot
The all-in-one command for the impatient:
dbt build
It runs, tests, and snapshots in one go. 🚀.
dbt seed
Convert CSV files to tables
dbt seed
View and lint CSV like a pro in Paradime Code IDE.
dbt ls
List your models
dbt ls
# list the most important resources
dbt ls --select tag:important
dbt show
Preview your model's output:
dbt show --select cool_waffle
Oops, something failed? Try again:
dbt retry
dbt run-operation
Run custom macros:
dbt run-operation crazy_macro
Clone your production environment faster than you can say "duplicate":
dbt clone --state path/to/artifacts
There you have it, folks! These dbt™ commands and options will get your started. Mix and match to suit your needs and add multiple commands together to do perform more complex tasks.