Deferral using dbt™ - a definitive guide
A definitive guide on how to use deferral using dbt™ with code examples and pro tips. Save time and cost using this powerful technique.
Kaustav Mitra
Jul 29, 2024
·
5
min read
What is dbt™ Deferral?
Deferral in dbt™ allows you to skip rebuilding unmodified models during development. It's a game-changer for speeding up your workflow, especially when working with large projects.
When to Use It
Use deferral when:
You're making incremental changes
You want faster development cycles
You're working on a subset of models
Your project has a long build time
CLI Options
The main command is dbt run --defer
, but there are several options:
-state
: Specify the path to your state artifacts-models
: Select specific models to run-exclude
: Exclude certain models-selector
: Use YAML selectors for complex selections
Examples and Scenarios
Scenario 1: Basic Deferral
You've made changes to one model. Run only that model and its downstream dependencies:
Scenario 2: Exclude Specific Models
You've updated several models but want to exclude models based on their materialization type e.g. this is useful during testing as rebuilding an incremental model is not needed:
Scenario 3: Using Graph Selectors
Run modified models and their direct children:
Scenario 4: Combining Selectors
Run modified models in the 'marketing' folder and their descendants:
Best Practices
Keep your state artifacts up-to-date
Use version control for your dbt™ project
Combine deferral with other dbt™ commands like
dbt test
Understand your project's dependency graph
Potential Pitfalls
Outdated state artifacts can lead to inconsistent results
Overusing deferral might miss important model interactions
Complex selector combinations can be hard to debug
Advanced Tips
Use
dbt ls
with similar selectors to preview which models will runIntegrate deferral into your CI/CD pipeline for efficient testing
Create custom YAML selectors for common deferral scenarios
Using dbt™ deferral to save costs
dbt™ deferral can be a powerful tool for cost savings, especially in development and production environments. Here's how it can help:
Reduced compute time:By running only modified models and their dependencies, you're using less computing resources. This directly translates to lower costs on platforms like Snowflake or BigQuery.
Efficient CI/CD:Implement deferral in your CI/CD pipeline to run only necessary models during testing. This speeds up pipelines and reduces cloud compute costs.
Development efficiency:Faster development cycles mean less time spent waiting for full runs. This can lead to significant labor cost savings over time.
Optimized warehouse usage:By running fewer queries, you're reducing the load on your data warehouse. This can lead to downsizing warehouse clusters or reducing uptime.
Selective full refreshes:Use deferral to refresh only specific data streams, avoiding costly full refreshes of your entire data warehouse.
Cloud storage optimization:Less data movement and fewer intermediate tables can result in reduced cloud storage costs.
Resource allocation:By using fewer resources for routine tasks, you can allocate more to critical, resource-intensive jobs without increasing overall spend.
Scalability without proportional cost increase:As your dbt™ project grows, deferral allows you to scale without a linear increase in processing time and associated costs.
Remember, while deferral can lead to significant savings, it's crucial to balance cost-cutting with maintaining data quality and consistency. Regular full runs and comprehensive testing should still be part of your workflow.
When should teams do regular full runs?
Teams should perform regular full runs of their dbt™ projects to ensure data integrity and catch any potential issues. Here's when and why to do full runs:
Release cycles:Do a full run before major releases or deployments to production.
Weekly/monthly checks:Schedule regular full runs (e.g., every Sunday night) to verify overall project health.
After significant changes:Run everything when making substantial changes to core models or introducing new data sources.
Data reconciliation:Perform full runs when reconciling data with external systems or reports.
Troubleshooting:When encountering unexpected results, a full run can help isolate issues.
Performance benchmarking:Periodically run everything to track overall project performance over time.
Before major business events:Ensure all data is up-to-date before critical business periods (e.g., financial reporting, peak sales seasons).
Periodic data quality checks:Use full runs in conjunction with comprehensive testing to maintain high data quality standards.
Team onboarding:When new team members join, a full run helps them understand the entire project scope.
The frequency of full runs depends on project size, team velocity, and business criticality. Smaller teams might do weekly full runs, while larger enterprises may opt for daily or even more frequent complete refreshes.
Top issues analysts face when working with dbt™ deferral
Here are the top 10 issues analysts often face when working with dbt™ deferral:
Outdated state artifacts:Using old or incorrect manifest files can lead to unexpected results. Analysts might miss running crucial models.
Incomplete dependency understanding:Not fully grasping model dependencies can result in incomplete or inaccurate data when using deferral.
Overreliance on deferral:Analysts might lean too heavily on deferral, missing important interactions between models.
Complex selector syntax:Crafting the right selector commands can be tricky, especially for complex project structures.
Inconsistent environments:Differences between development and production environments can cause deferral to behave unexpectedly.
Version control conflicts:Merging changes from multiple team members can lead to conflicts in deferred runs.
Testing gaps:Deferral might skip certain tests, leading to undetected data quality issues.
CI/CD integration challenges:Implementing deferral in automated pipelines can be complex and prone to errors.
Debugging difficulties:When issues arise, it can be harder to trace the problem due to partial runs.
To mitigate these issues:
Keep state artifacts updated
Thoroughly document model dependencies
Use deferral judiciously
Practice with selector syntax
Ensure environment consistency
Implement strong version control practices
Maintain comprehensive testing alongside deferral
Benchmark performance regularly
Carefully design CI/CD pipelines
Develop robust debugging strategies
Guidelines teams to maximize the impact of dbt™ deferral
The 5 most important guidelines teams should have in place to maximize the impact of dbt™ deferral:
Maintain up-to-date state artifacts
Guideline: Automate the process of generating and storing state artifacts after each successful production run.
Why: Ensures deferral decisions are based on the most recent production state, preventing missed model runs and inconsistencies.
Implement a clear branching strategy
Guideline: Use a Git flow or feature branch workflow, with deferral integrated into the development process.
Why: Helps manage parallel development efforts and ensures deferral is used consistently across the team.
Establish standard deferral commands
Guideline: Create a set of pre-approved deferral commands or YAML selectors for common scenarios.
Why: Reduces errors, improves consistency, and makes it easier for team members to use deferral effectively.
Integrate deferral into CI/CD pipelines
Guideline: Use deferral in CI/CD to run only modified models and their dependencies during pull request checks.
Why: Speeds up the feedback loop, reduces compute costs, and catches issues early in the development process.
These guidelines help teams leverage dbt™ deferral effectively, balancing speed and efficiency with data integrity and quality assurance. They promote consistent practices across the team and integrate deferral into the broader development workflow.
Conclusion
Deferral is a powerful tool in the dbt™ toolkit. It can significantly speed up your development process, but use it wisely. Always ensure you're working with up-to-date state artifacts and understand your project's dependencies.
Remember: With great power comes great responsibility. Deferral can supercharge your workflow, but it's not a substitute for comprehensive testing and validation of your entire dbt™ project.
Now go forth and defer like a pro!