Işın Pesch analyzes 17M Instagram posts to reveal top travel destinations and engagement patterns in Paradime's dbt™ Data Modeling Challenge.
Welcome to the "Social Media Challenge Highlight Reel" series! 🙌
This blog series showcases the "best of" submissions from Paradime's dbt™ Data Modeling Challenge - Social Media Edition, highlighting the remarkable data professionals behind them.
If you're unfamiliar with the Social Media Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the challenge landing page. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "Social Media Challenge Highlight Reel" blog, you'll discover:
Now, let's dive into our featured submission by Işın Pesch!
Hey there! My name is Işın Pesch, and I'm a Senior Data Analytics Engineer at Deel. I recently participated in Paradime's Social Media Data Modeling Challenge, and I'm excited to share my experience with you!
I'm mainly based in Aachen, Germany, and I've been in the data field for about 2.5 years since graduating from Electronics Engineering. I won the previous modeling challenge organized by Paradime and couldn't resist participating in this one as well, especially because the topic and the tool stack were quite interesting to me.
In this blog, I'll start by sharing a few insights I uncovered, then I'll dive into how I built my project and how I leveraged Paradime, MotherDuck, and Hex to make it all happen. Let's get started!
To tackle this challenge, I used the following set of tools:
From the moment I heard about the challenge topic, I had the idea to do something related to Instagram travel posts. I love traveling, and so far, Instagram has helped me a lot with my travel planning, which is why this challenge was specifically interesting for me.
Here's how I structured my project:
My Hex dashboard consisted of 3 main sections: Most Instagrammable Destinations, Engagement in Travel Posts, and Social Media Buzz Affecting Travel Behavior. You can find my fully published interactive Hex dashboard here! Below are some of the key highlights.
I created a bar chart showing the top 20 countries mentioned in Instagram posts. The United States 🇺🇸 is the top country mentioned. Overall, the Americas and Europe seem to be getting the most attention from Instagram posts. I later dove deeper into whether this is due to tourism or the Instagram user density in those countries.
For this insight, I used an intermediate model, int_country_mentions, that acts as a lookup table for each country name mentioned in the Instagram post description. I then fed this model into the int_instagrammable_destinations model to bring other relevant dimensions about post details together. Finally, I surfaced this in the mart layer as an instagrammable_destinations model which was exposed to the BI layer.
I created a bar chart showing the profile engagement against the follower tier a user has. The chart is faceted into posts from the Travel & Adventure and Food & Dining categories. I found that having a very large follower count increases the engagement rate dramatically. The rest of the follower tiers are rather close to each other.
To uncover this insight, I took the int_instagrammable_destinations model and used it to make a fact table as fact_profile_engagement to uncover aggregated engagement metrics of each Instagram profile.
I created two bar charts: one showing the most visited countries in 2019 by looking at the number of international arrivals, and another showing the top 20 countries mentioned in Instagram posts in 2019.
I found that most of the countries appear commonly in both charts. What caught my attention is that the most visited countries include East Asian countries like China 🇨🇳, Hong Kong 🇭🇰, and Macao 🇲🇴, however, they don't at all show up in the Instagram mentions chart. One reason for this could be that Instagram is banned in China and thus it is not really a popular platform nor super easy to access, explaining why there aren't many posts from China.
For this insight, I used the mart model instagrammable_destinations for both charts by using 2 different SQL queries in Hex.
I found Paradime to be incredibly useful and easy to use for dbt development. The following features were particularly valuable:
Joining two modeling challenges organized by Paradime was incredibly valuable for my career. Not only did I build an amazing portfolio and increase my network, but I also learned many things about data warehouse connections and data engineering. I am heavily working on data modeling with dbt in my daily job already, but building the full data pipeline was definitely very useful for me to learn more about the data engineering basics.
I believe joining a data challenge like this would give a tremendous opportunity for people to showcase their skills and build a very strong portfolio. I had people reaching out to me on LinkedIn after seeing my submission. Plus, the prize money is surely encouraging! 🙂
I was thrilled to hear from two winners of this social media challenge who mentioned that my previous submission had inspired their work. It's incredibly motivating to know that my project served as an inspiration for such high-quality submissions. This experience highlights the collaborative nature of these challenges and how we can all learn from and inspire each other in the data community.
If you're considering participating in future challenges, I'd highly recommend it. You'll not only get to apply your skills in a real-world scenario but also learn new tools and techniques. The experience you gain and the connections you make are invaluable for your career growth in the data field.