Paradime | Deferral using dbt™

🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more

Learn

Deferral using dbt™ - a definitive guide

A definitive guide on how to use deferral using dbt™ with code examples and pro tips. Save time and cost using this powerful technique.

Kaustav Mitra

Jul 29, 2024

min read

What is dbt™ Deferral?

Deferral in dbt™ allows you to skip rebuilding unmodified models during development. It's a game-changer for speeding up your workflow, especially when working with large projects.

When to Use It

Use deferral when:

You're making incremental changes
You want faster development cycles
You're working on a subset of models
Your project has a long build time

CLI Options

The main command is dbt run --defer, but there are several options:

-state: Specify the path to your state artifacts
-models: Select specific models to run
-exclude: Exclude certain models
-selector: Use YAML selectors for complex selections

Examples and Scenarios

Scenario 1: Basic Deferral

You've made changes to one model. Run only that model and its downstream dependencies:

Scenario 2: Exclude Specific Models

You've updated several models but want to exclude models based on their materialization type e.g. this is useful during testing as rebuilding an incremental model is not needed:

Scenario 3: Using Graph Selectors

Run modified models and their direct children:

Scenario 4: Combining Selectors

Run modified models in the 'marketing' folder and their descendants:

Best Practices

Keep your state artifacts up-to-date
Use version control for your dbt™ project
Combine deferral with other dbt™ commands like dbt test
Understand your project's dependency graph

Potential Pitfalls

Outdated state artifacts can lead to inconsistent results
Overusing deferral might miss important model interactions
Complex selector combinations can be hard to debug

Advanced Tips

Use dbt ls with similar selectors to preview which models will run
Integrate deferral into your CI/CD pipeline for efficient testing
Create custom YAML selectors for common deferral scenarios

Using dbt™ deferral to save costs

dbt™ deferral can be a powerful tool for cost savings, especially in development and production environments. Here's how it can help:

Reduced compute time:By running only modified models and their dependencies, you're using less computing resources. This directly translates to lower costs on platforms like Snowflake or BigQuery.
Efficient CI/CD:Implement deferral in your CI/CD pipeline to run only necessary models during testing. This speeds up pipelines and reduces cloud compute costs.
Development efficiency:Faster development cycles mean less time spent waiting for full runs. This can lead to significant labor cost savings over time.
Optimized warehouse usage:By running fewer queries, you're reducing the load on your data warehouse. This can lead to downsizing warehouse clusters or reducing uptime.
Selective full refreshes:Use deferral to refresh only specific data streams, avoiding costly full refreshes of your entire data warehouse.
Cloud storage optimization:Less data movement and fewer intermediate tables can result in reduced cloud storage costs.
Resource allocation:By using fewer resources for routine tasks, you can allocate more to critical, resource-intensive jobs without increasing overall spend.
Scalability without proportional cost increase:As your dbt™ project grows, deferral allows you to scale without a linear increase in processing time and associated costs.

Remember, while deferral can lead to significant savings, it's crucial to balance cost-cutting with maintaining data quality and consistency. Regular full runs and comprehensive testing should still be part of your workflow.

When should teams do regular full runs?

Teams should perform regular full runs of their dbt™ projects to ensure data integrity and catch any potential issues. Here's when and why to do full runs:

Release cycles:Do a full run before major releases or deployments to production.
Weekly/monthly checks:Schedule regular full runs (e.g., every Sunday night) to verify overall project health.
After significant changes:Run everything when making substantial changes to core models or introducing new data sources.
Data reconciliation:Perform full runs when reconciling data with external systems or reports.
Troubleshooting:When encountering unexpected results, a full run can help isolate issues.
Performance benchmarking:Periodically run everything to track overall project performance over time.
Before major business events:Ensure all data is up-to-date before critical business periods (e.g., financial reporting, peak sales seasons).
Periodic data quality checks:Use full runs in conjunction with comprehensive testing to maintain high data quality standards.
Team onboarding:When new team members join, a full run helps them understand the entire project scope.

The frequency of full runs depends on project size, team velocity, and business criticality. Smaller teams might do weekly full runs, while larger enterprises may opt for daily or even more frequent complete refreshes.

Top issues analysts face when working with dbt™ deferral

Here are the top 10 issues analysts often face when working with dbt™ deferral:

Outdated state artifacts:Using old or incorrect manifest files can lead to unexpected results. Analysts might miss running crucial models.
Incomplete dependency understanding:Not fully grasping model dependencies can result in incomplete or inaccurate data when using deferral.
Overreliance on deferral:Analysts might lean too heavily on deferral, missing important interactions between models.
Complex selector syntax:Crafting the right selector commands can be tricky, especially for complex project structures.
Inconsistent environments:Differences between development and production environments can cause deferral to behave unexpectedly.
Version control conflicts:Merging changes from multiple team members can lead to conflicts in deferred runs.
Testing gaps:Deferral might skip certain tests, leading to undetected data quality issues.
CI/CD integration challenges:Implementing deferral in automated pipelines can be complex and prone to errors.
Debugging difficulties:When issues arise, it can be harder to trace the problem due to partial runs.

To mitigate these issues:

Keep state artifacts updated
Thoroughly document model dependencies
Use deferral judiciously
Practice with selector syntax
Ensure environment consistency
Implement strong version control practices
Maintain comprehensive testing alongside deferral
Benchmark performance regularly
Carefully design CI/CD pipelines
Develop robust debugging strategies

Guidelines teams to maximize the impact of dbt™ deferral

The 5 most important guidelines teams should have in place to maximize the impact of dbt™ deferral:

Maintain up-to-date state artifacts
- Guideline: Automate the process of generating and storing state artifacts after each successful production run.
- Why: Ensures deferral decisions are based on the most recent production state, preventing missed model runs and inconsistencies.
Implement a clear branching strategy
- Guideline: Use a Git flow or feature branch workflow, with deferral integrated into the development process.
- Why: Helps manage parallel development efforts and ensures deferral is used consistently across the team.
Establish standard deferral commands
- Guideline: Create a set of pre-approved deferral commands or YAML selectors for common scenarios.
- Why: Reduces errors, improves consistency, and makes it easier for team members to use deferral effectively.
Integrate deferral into CI/CD pipelines
- Guideline: Use deferral in CI/CD to run only modified models and their dependencies during pull request checks.
- Why: Speeds up the feedback loop, reduces compute costs, and catches issues early in the development process.

These guidelines help teams leverage dbt™ deferral effectively, balancing speed and efficiency with data integrity and quality assurance. They promote consistent practices across the team and integrate deferral into the broader development workflow.

Conclusion

Deferral is a powerful tool in the dbt™ toolkit. It can significantly speed up your development process, but use it wisely. Always ensure you're working with up-to-date state artifacts and understand your project's dependencies.

Remember: With great power comes great responsibility. Deferral can supercharge your workflow, but it's not a substitute for comprehensive testing and validation of your entire dbt™ project.

Now go forth and defer like a pro!

Interested to learn more?
Try out the free 14-days trial

Start free trial

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Made with ❤️ in San Francisco ・ London

Start for free

Made with ❤️ in San Francisco ・ London

Deferral using dbt™ - a definitive guide

What is dbt™ Deferral?

When to Use It

CLI Options

Examples and Scenarios

Scenario 1: Basic Deferral

Scenario 2: Exclude Specific Models

Scenario 3: Using Graph Selectors

Scenario 4: Combining Selectors

Best Practices

Potential Pitfalls

Advanced Tips

Using dbt™ deferral to save costs

When should teams do regular full runs?

Top issues analysts face when working with dbt™ deferral

Guidelines teams to maximize the impact of dbt™ deferral

Conclusion

Interested to learn more?Try out the free 14-days trial

Test Drive Paradime Today

Test Drive Paradime Today

Test Drive Paradime Today

Interested to learn more?
Try out the free 14-days trial