Discover the 1st place winner's insights and best practices from the Movie Data Modeling Challenge!
Welcome to the "Movie Challenge Highlight Reel" series 🙌
This blog series will showcase the "best of" submissions from Paradime and Lightdash's Movie Data Modeling Challenges, highlighting the remarkable data professionals behind them.
If you're unfamiliar with the Movie Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner's announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "Movie Challenge Highlight Reel" blog, you'll discover:
Now let's check out our first installment, exploring Isin Pesch and her submission!
Hey there! My name is Isin Pesch, and I'm a data analytics engineer at Deel. I recently competed in Paradime's Movie Data Modeling Challenge, and I'm proud to say I took first place!
In this blog, I'll start by sharing a few insights I uncovered, and then I'll dive into how I built my project and how I used Paradime to make it all happen.
Below are three of my favorite insights I uncovered, but you can check out the rest in my Github readme.md file.
Approach: I built int_movies_mapping.sql to aggregate individual success metrics like revenue, Rotten Tomatoes rating, IMDb votes, and major awards. I then used int_combined_movie_success.sql to normalize these metrics and combine them into a single success rating.
Approach: Similar to the insight above, I leveraged the combined success metric to identify the best actor-director pairs (minimum two movies together) in actor_director_success.sql.
Approach: With the combined success metric as my starting point, I grouped all movies by release year and calculated their average combined success by year.
To build my project, I began by thoroughly understanding the provided datasets in Snowflake. My initial focus was identifying the primary metric for my dashboard, which centered around the concept of success. I then assessed the data quality, which took significantly more time than anticipated. I did simple quality checks in my staging layer, and solved more complex data issues in the intermediate layer.
With this foundation, I visualized the final structure and key insights needed for my dashboard, which you can see below in my data lineage.
Of course, my project game plan wasn't perfectly linear. I ran into several roadblocks and mishaps along the way, but I tried to stay as disciplined as possible to reach the challenge deadline.
This was my first time using Paradime, and the learning curve was almost non-existent; I was up and running in minutes. It has all the features I have come to expect from cloud-hosted dbt™ platforms. One feature I found particularly useful throughout development was Paradime's natively-supported code linter, SQL fluff.
Within my .sqlfluff file, I defined rules for formatting and coding conventions. For instance, I ensured all SQL keywords were lowercase and established consistent coloring rules. With a single click, I could apply these rules to my entire model, automatically fixing any violations. This feature streamlined my workflow and ensured my code remained clean and readable, contributing to the project's success.
Participating in this challenge was definitely worth my time and energy. Diving into movie franchises and seeing the profitability and ratings trends was super interesting, and it helped me improve my analytics engineering capabilities.
I highly recommend signing up for Paradime’s next dbt™ data modeling challenge.