Paradime | Movie Challenge Rewind: Hollywood Economics

🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more
🚀 RELEASE ALERT 🚀 Paradime DinoAI Agent: Data to insights in seconds
Learn more
How Paradime boosts customer data pipelines by 50% on AWS Graviton at no extra cost
Learn more

Data challenge

Movie Challenge Rewind: Hollywood Economics

From historic blockbusters to inflation-adjusted earnings, Anton Goncharuk reveals the money-side of Hollywood.

Anton Goncharuk

Jul 11, 2024

min read

Introduction

Hey there! I'm Anton Goncharuk, Principal Analytics Engineer at Hubspot. I’ve been using dbt™ since early 2019, so when I discovered Paradime’s Movie Data Modeling Challenge, I couldn't resist the chance to participate!

If you aren't already familiar with Paradime’s Movie Data Modeling Challenge, check out this blog for a brief overview!

In this blog, I'll share insights about my challenge submission, including the journey of building my project, the movie insights I uncovered, and how I used Paradime to bring my project to life. Enjoy!

Building My Project

Every data professional knows that uncovering insights is just the tip of the iceberg. Before reaching that point, countless hours are spent brainstorming, building a project plan, overcoming data issues, and hitting dead ends. Here’s a quick summary of my project-building process:

Project Strategy: I started the project by diving deep into the movie datasets. My main goals were to understand the historical movie datasets and build a state-of-the-art data infrastructure that delivers clean, validated data for BI layers. I aimed to create a dbt™ project that could serve multiple use cases, prioritizing data governance and reliability over flashy BI presentations.
Challenges Faced: One of the significant challenges was dealing with inconsistent data from various sources. For example, TMDB and OMDB had wildly different revenue figures for the same movies. Without a business team to guide me, I had to make several assumptions on my own. Usually, there are business teams to tell you what to prioritize. For this challenge, I had to decide on my own.
Key Learnings: This project reinforced the importance of balancing backend infrastructure with the final presentation layer. While I focused heavily on data governance, I realized the final BI layer often captures more attention. This experience underscored the need for both solid data management and compelling data storytelling.
Execution Process: Throughout the project, I used SQL and dbt™ within for modeling and Lightdash for visualization. Paradime’s tools, especially the Lineage feature, were crucial in keeping my workflow organized. I leveraged the CLI for quick YAML file generation and used the Data Preview for sanity checks, ensuring my models produced accurate results.

Overall, this project was a blend of tackling technical challenges, learning new tools, and reinforcing the importance of clean, reliable data in analytics.

Insights Uncovered

Below are some of the key insights I uncovered during the challenge, but can view my additional data insights and visualizations in my GitHub repo.

Insight #1: The Highest-Grossing Films of All Time

Gone with the Wind (1939) earned $402 million in box office revenue back then. Adjusted for inflation, that’s equivalent to $8.7 billion in 2024. This insight helps contextualize historical data within modern economic conditions, providing a clearer picture of a movie's true financial success.

Approach: I started by creating int_inflation_adjustments__yearly.sql to compute CPI ratios, allowing me to adjust financial figures for inflation. Next, I built int_tmdb_media.sql to consolidate and enrich TMDB movies. After that, I merged this enriched dataset with OMDB data in media.sql, prioritizing TMDB data for accuracy.

Insight #1: The Highest-Grossing Films of All Time | dbt | Paradime.io

Insight #2: Top 10 Most Appearing Actors of All Time

Mel Blanc, known as "The Man of a Thousand Voices," is one of Hollywood's most prolific actors, with over a thousand screen credits. He created and performed nearly 400 distinct character voices, becoming renowned worldwide for his work in radio, television, cartoons, and movies.

Approach: I first processed IMDb principals data in stg_imdb__principals.sql to extract actor roles and characters. Then, I enriched this data with actor details using stg_imdb__names.sql, obtaining full names and notable titles. Finally, in crew.sql, I merged these datasets, ensuring unique actor-role combinations to accurately count appearances and determine the top actors.

Insight #2: Top 10 Most Appearing Actors of All Time | dbt | Pradime.io

Insight #3: Top 10 Highest-Grossing Directors of All Time

Steven Spielberg is a legendary figure in cinema, directing iconic films like "Jaws," "E.T. the Extra-Terrestrial," and "Jurassic Park." His films have grossed immensely over the years, making him one of the highest-grossing directors of all time.

Approach: I used the same approach as “Top 10 Most Appearing Actors of All Time”, but afterward, I joined the table crew.sql with media.sql to identify the “Top 10 Highest-Grossing Directors of All Time.”

Insight #3: Top 10 Highest-Grossing Directors of All Time | dbt | Paradime.io

Insight #4: Top Money-Making Production Companies

Although this doesn't directly relate to Profit or Return On Investment (ROI), Warner Bros. Pictures appears to be a leader in the movie production industry based on gross revenue (box office), the total number of movies produced, and the number of Oscars their movies have received.

Approach: I developed int_tmdb_media.sql to consolidate movie data from TMDB. Next, I used media.sql to adjust financial figures for inflation, and produce a comprehensive, unified dataset of movie details.

Insight #4: Top Money-Making Production Companies | dbt | Paradime.io

My Paradime Stand-Outs

Paradime was obviously instrumental in my project, offering several features that enhanced my workflow and overall project quality. Here are the three key features that stood out:

Data Lineage Preview: This feature allowed me to see the end-to-end data flow from the source to the BI layer. It helped me visualize the entire data process, ensuring no redundant logic and maintaining a streamlined project. By using the Lineage feature, I could track how data moved through various transformations and models, making it easier to troubleshoot issues and optimize my workflow. This visualization was crucial for maintaining data integrity and ensuring that the logic was not duplicated unnecessarily.
‍
Integrated Terminal (CLI): As a heavy CLI user, having robust CLI within Paradime made my workflow much smoother. I could quickly generate YAML files, automate repetitive tasks, and seamlessly integrate various Python packages. The CLI allowed me to execute commands efficiently, which sped up the development process. For example, I used the CLI to create and manage dbt™ models, run tests, and generate documentation. This integration kind of bridged the gap between Paradime and my usual development environment, making the transition almost effortless and allowing me to leverage my existing skills.

Data Preview: This feature was essential for performing quick sanity checks on my developing models. It allowed me to verify that the results matched at every step, ensuring accuracy and reliability. I really like how the Data Explorer provided a user-friendly interface fore querying and inspecting data, which was really useful when developing and refining my models. I used it to validate transformations and ensure that the data output was as expected, which helped catch errors early and maintain high-quality data throughout the project. The explorer By allowing for immediate feedback, the Data Explorer was invaluable in maintaining the project's overall integrity.

Wrap Up

The Movie Data Modeling Challenge was super fun, balancing data infrastructure and BI presentation. Paradime's features, especially Lineage and CLI, were game-changers. I tackled inconsistent data and learned the importance of data governance.

Thanks to Paradime, Lightdash, and the community for this awesome experience. Excited for future challenges, I’ll definitely join again… and so should you!

‍

Schedule a call with the team and learn how to maximize the impact of analytics

Interested to learn more?
Try out the free 14-days trial

Start free trial

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

Accelerating Data Warehouse Migrations with DinoAI: From Redshift to Trino in Minutes

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

DinoAI: Enforcing Standards Across Your dbt Project with .dinorules

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Product

Paradime DinoAI vs dbt™ Copilot: A Comparative Analysis

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Test Drive Paradime Today

Start our free 14-day trial and experience the power of AI in analytics

Start for free

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Made with ❤️ in San Francisco ・ London

Start for free

Made with ❤️ in San Francisco ・ London

Movie Challenge Rewind: Hollywood Economics

Introduction

Building My Project

Insights Uncovered

Insight #1: The Highest-Grossing Films of All Time

Insight #2: Top 10 Most Appearing Actors of All Time

Insight #3: Top 10 Highest-Grossing Directors of All Time

Insight #4: Top Money-Making Production Companies

My Paradime Stand-Outs

Wrap Up

‍

Schedule a call with the team and learn how to maximize the impact of analytics

Interested to learn more?Try out the free 14-days trial

Test Drive Paradime Today

Test Drive Paradime Today

Test Drive Paradime Today

Interested to learn more?
Try out the free 14-days trial