Social Media Challenge Rewind: Most Instagrammable Destinations
Işın Pesch analyzes 17M Instagram posts to reveal top travel destinations and engagement patterns in Paradime's dbt™ Data Modeling Challenge.
Işın Pesch
Oct 21, 2024
·
4
min read
Welcome to the "Social Media Challenge Highlight Reel" series! 🙌
This blog series showcases the "best of" submissions from Paradime's dbt™ Data Modeling Challenge - Social Media Edition, highlighting the remarkable data professionals behind them.
If you're unfamiliar with the Social Media Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the challenge landing page. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "Social Media Challenge Highlight Reel" blog, you'll discover:
Key Social Media insights: Uncover valuable insights participants derived from social media datasets, revealing scroll-stopping stories about user behavior, engagement patterns, and global trends.
Analytics Engineering best practices: Learn about the participants' approach to project execution, from initial analysis to final insights, including their coding techniques (SQL, dbt™) and innovative use of tools (Paradime, MotherDuck, Hex).
Now, let's dive into our featured submission by Işın Pesch!
Introduction
Hey there! My name is Işın Pesch, and I'm a Senior Data Analytics Engineer at Deel. I recently participated in Paradime's Social Media Data Modeling Challenge, and I'm excited to share my experience with you!
I'm mainly based in Aachen, Germany, and I've been in the data field for about 2.5 years since graduating from Electronics Engineering. I won the previous modeling challenge organized by Paradime and couldn't resist participating in this one as well, especially because the topic and the tool stack were quite interesting to me.
In this blog, I'll start by sharing a few insights I uncovered, then I'll dive into how I built my project and how I leveraged Paradime, MotherDuck, and Hex to make it all happen. Let's get started!
Toolkit for success
To tackle this challenge, I used the following set of tools:
MotherDuck for Data Warehousing and Computation: It was my first time using MotherDuck, but the UI experience was very easy to get into. I liked the notebook-like structure to run the queries. Functions supported for the SQL were also sufficient and intuitive. The only issue I had was around queries with very long runtimes. Although these seemed to be killed in UI, they were still running forever in the background, basically blocking the whole db.
Paradime for dbt Development: Very smooth dbt developer experience and beyond. More details below.
Hex for Data Visualization: Very flexible and incredibly powerful BI tool enabling slick and user-friendly dashboard designs.
Building my project
From the moment I heard about the challenge topic, I had the idea to do something related to Instagram travel posts. I love traveling, and so far, Instagram has helped me a lot with my travel planning, which is why this challenge was specifically interesting for me.
Here's how I structured my project:
Research: Perhaps the biggest time spent in this project for me was on research. Although I had a rather clear objective, getting data wasn't easy. I explored many different options to get Instagram data in mass. In the end, I found a Hugging Face dataset that had 17 million Instagram posts, and I settled with that although it contained data only until 2019.
Identifying key questions: The key area I wanted to focus on was around how Instagram affects real-life tourism. In the end, I was partially able to cover this, but I didn't have enough data from each year homogeneously to go deep into this question. Instead, I noticed that there were other interesting questions to answer within the dataset I had, such as how follower count affects post engagement or what are the most Instagrammable countries.
Data exploration: After getting my main Instagram dataset into MotherDuck, I made some explorations directly in the MotherDuck UI to see the quality of the data and whether it was sufficient to answer my main exploration questions.
Data modeling and visualization: Deciding on the main structure of the final dashboard and having an idea of the key Insights to be visualized is the key to modeling. Insights dictate the modeling requirements and create a structure to the data modeling approach that should be taken.
Challenges Faced: The biggest challenge I had was not having enough data points to identify some seasonal trends as I initially intended to. The second challenge was around creating a compute-efficient query that would go through all the posts in my dataset and cross-check each word in the post description with the city list I imported.
Insights uncovered
My Hex dashboard consisted of 3 main sections: Most Instagrammable Destinations, Engagement in Travel Posts, and Social Media Buzz Affecting Travel Behavior. You can find my fully published interactive Hex dashboard here! Below are some of the key highlights.
1. Top Countries Mentioned in Travel Posts
I created a bar chart showing the top 20 countries mentioned in Instagram posts. The United States 🇺🇸 is the top country mentioned. Overall, the Americas and Europe seem to be getting the most attention from Instagram posts. I later dove deeper into whether this is due to tourism or the Instagram user density in those countries.
For this insight, I used an intermediate model, int_country_mentions, that acts as a lookup table for each country name mentioned in the Instagram post description. I then fed this model into the int_instagrammable_destinations model to bring other relevant dimensions about post details together. Finally, I surfaced this in the mart layer as an instagrammable_destinations model which was exposed to the BI layer.
2. Engagement of Instagram profiles
I created a bar chart showing the profile engagement against the follower tier a user has. The chart is faceted into posts from the Travel & Adventure and Food & Dining categories. I found that having a very large follower count increases the engagement rate dramatically. The rest of the follower tiers are rather close to each other.
To uncover this insight, I took the int_instagrammable_destinations model and used it to make a fact table as fact_profile_engagement to uncover aggregated engagement metrics of each Instagram profile.
3. Social Media Buzz affecting Travel Behaviour?
I created two bar charts: one showing the most visited countries in 2019 by looking at the number of international arrivals, and another showing the top 20 countries mentioned in Instagram posts in 2019.
I found that most of the countries appear commonly in both charts. What caught my attention is that the most visited countries include East Asian countries like China 🇨🇳, Hong Kong 🇭🇰, and Macao 🇲🇴, however, they don't at all show up in the Instagram mentions chart. One reason for this could be that Instagram is banned in China and thus it is not really a popular platform nor super easy to access, explaining why there aren't many posts from China.
For this insight, I used the mart model instagrammable_destinations for both charts by using 2 different SQL queries in Hex.
How I used Paradime
I found Paradime to be incredibly useful and easy to use for dbt development. The following features were particularly valuable:
Integrated Terminal: The integrated terminal within Paradime's Code IDE made working with git and running dbt commands effortless. I could seamlessly stage, commit, and push my changes without switching between multiple tools. This built-in terminal also allowed me to execute all necessary dbt commands like dbt run, dbt test, and sqlfluff directly, with instant feedback.
Data Explorer: The Data Explorer feature was a game-changer when it came to verifying model outputs. As I built my models, I could instantly check the data transformations and validate them on the fly. This interactive interface gave me the confidence to iterate quickly and ensure that my analytics pipeline was accurate and up-to-date.
Dino AI: Dino AI acted as an extra pair of hands during the development process. Whenever I needed help with syntax, debugging complex dbt queries, or understanding certain errors, I could rely on Dino to provide quick suggestions and explanations. For example, when I encountered a complex commit issue, Dino helped by giving step-by-step instructions to the solution.
Reflections and lessons learned
Joining two modeling challenges organized by Paradime was incredibly valuable for my career. Not only did I build an amazing portfolio and increase my network, but I also learned many things about data warehouse connections and data engineering. I am heavily working on data modeling with dbt in my daily job already, but building the full data pipeline was definitely very useful for me to learn more about the data engineering basics.
Looking ahead
I believe joining a data challenge like this would give a tremendous opportunity for people to showcase their skills and build a very strong portfolio. I had people reaching out to me on LinkedIn after seeing my submission. Plus, the prize money is surely encouraging! 🙂
I was thrilled to hear from two winners of this social media challenge who mentioned that my previous submission had inspired their work. It's incredibly motivating to know that my project served as an inspiration for such high-quality submissions. This experience highlights the collaborative nature of these challenges and how we can all learn from and inspire each other in the data community.
If you're considering participating in future challenges, I'd highly recommend it. You'll not only get to apply your skills in a real-world scenario but also learn new tools and techniques. The experience you gain and the connections you make are invaluable for your career growth in the data field.