Characterizing Sponsored Content in Facebook and Instagram

Emanuelle Azevedo Martins, Computer Science Department, UFMG, Brazil,
Isadora Salles, Computer Science Department, UFMG, Brazil,
Fabricio Benevenuto, Computer Science Department, UFMG, Brazil,
Olga Goussevskaia, Computer Science Department, UFMG, Brazil,

In this work we present a comparative analysis of influencer marketing evolution on Facebook and Instagram, spanning the pre and post Covid-19 pandemic onset periods. We collected and characterized a large-scale cross-platform dataset, comprised of 9.5 million sponsored posts. We analyzed the relative growth rates of the number of ads and of user engagement within different topics of interest, such as sports, retail, travel, and politics. We discuss which topics have been most impacted by the onset of the pandemic, both in terms of sponsored content supply and demand. With this work we hope to expand the understanding of influence dynamics on social networks and provide support for the development of more contextualized and effective branding strategies.

CCS Concepts:Information systems → Multimedia information systems;

Keywords: Social network analysis, sponsored content, influencer marketing, Instagram, Facebook.

ACM Reference Format:
Emanuelle Azevedo Martins, Isadora Salles, Fabricio Benevenuto, and Olga Goussevskaia. 2022. Characterizing Sponsored Content in Facebook and Instagram. In Proceedings of the 33rd ACM Conference on Hypertext and Social Media (HT '22), June 28-July 1, 2022, Barcelona, Spain. ACM, New York, NY, USA 12 Pages.


In the past decade, having a smartphone has transformed any person into a potential content creator. Cameras have become increasingly sophisticated, processors faster, and networks ever more ubiquitous. Instagram created filters that made ordinary photos look striking. TikTok has made video editing just as effortless, and Facebook provided recording tools that took amateur podcasters to the next level. Influencers are online content creators that cultivate a sense of intimacy among their followers through sharing authentic and lived experiences in the areas in which they claim expertise.

Influencer marketing is a form of online marketing with a global market value that has more than doubled since 2019, standing at around 13.8 billion U.S. dollars as of 20211. In the early days of influencer marketing, back in 2016, close watchers of influencer's accounts started noticing a small addition of “#ad” to the laudatory testimonials about the latest miracle product, as consumer advocacy groups started claiming that undisclosed sponsorship could be deceptive and the FTC published endorsement guides2. Since then, even though such ads have typically received less engagement than non-sponsored content [9], the practice of sponsorship disclosure has become widely adopted, either by adding hashtags or, more recently, by classifying such posts as sponsored-partnerships.

Academic research has recently studied several aspects of influencer marketing [2, 4, 7, 9, 15, 17, 19, 22]. Nonetheless, many challenging research questions remain to be addressed on the topic. In particular, many of the previous work relied on survey data with a small number of participants [17] or on datasets limited to a single platform and by restrictions on web crawling [2, 9, 15]. Data and, in particular, ads on online social media platforms are typically not entirely publicly available and notably hard to collect, requiring a long and expensive crawling process [23].

Our work is based on a dataset that has not been collected via web-crawling, but using CrowdTangle [28], a Facebook-owned tool that tracks interactions on public content from Facebook pages, Instagram accounts, and subreddits. Access to the tool has only recently (mid-2020) been granted to selected university researchers.

We collected and analyzed a large-scale cross-platform dataset, comprised of public (disclosed) sponsored (only) posts (thus of more directly measurable economic value), spanning pre and pandemic time periods (2016 − 2021). From Facebook, we obtained 4.2M sponsored posts from 172K distinct pages with at least 1,000 followers. From Instagram, we collected 5.2M ads from 286K distinct profiles with ≥ 1, 000 followers. In contrast to 100-300K sponsored posts collected in prior work [9, 15, 19].

First, we present a cross-platform comparative study of influencer marketing evolution on Facebook and Instagram, including advertising volume and influencer's follower bases over time, as well as newcomers and engagement trends. Second, we analyze the impact of the Covid-19 pandemic on the thematic categories of sponsored content on Facebook and also on the topics generated with the hashtags present in Instagram posts. We discuss which topics were most impacted in the pandemic period, both in terms of content supply (volume) and demand (engagement). To the extent of our knowledge, this is the first cross-platform large-scale analysis of influencer marketing evolution, spanning the time period from the early days of its advent, all the way through the eruption of the Covid-19 pandemic in 2020, capturing profound social and economic transformations that occurred worldwide in the aftermath of this global crisis.

The rest of this paper is organized as follows. In Section 2 we discuss related work. In Section 3 we describe the data collection process. In Section 4 we characterize the temporal evolution of influencer marketing on Facebook and Instagram. In Section 5 e 6 we analyze topic trends of sponsored posts before and during Covid-19 pandemic on Facebook and Instagram, respectively. In Section 7 we present our conclusions.


Next, we first present related work about the emergence of the influential marketing and sponsored content as effective forms of advertising. Then, we discuss studies that explored the controversy about disclosure of sponsored content. Finally, we cover efforts that explored sponsored content recently.

2.1 Emergence of an Influential Marketing

Understanding social connections and influence dynamics on online social networks has been the subject of many research studies for more than a decade [7, 16, 21]. Due to its huge potential for viral marketing, this research field has evolved towards different ramifications including understanding the role that users play on information flow [6, 14], identifying trendsetters and opinion leaders [22], and identifying topic experts for advertising purposes [11, 24].

In practice, a few years later, social platforms provided all tools for advertises to target very specific audiences [1, 25, 26], but sponsored content became a relevant strategy as well, in which influencers are paid by brands to post about their products. This new advertising ecosystem surpassed the old fashioned pop-up ads, considered annoying for most of the users [13]. Next, we briefly cover research efforts related to sponsored content in social media.

2.2 Sponsored Content Controversy

Often, sponsored content on social media can be hard to differentiate from non-sponsored content depending on how advertisers and influencers have blended a product with content. For example, when an influencer dedicated to post recipes on social media says which product brands would favor a certain recipe, it is hard to tell if that an intentional advertising or not. Indeed, recent studies show that consumers are easily misled by sponsored content that look like organic content, whether intentionally or unintentionally [30].

Interestingly, a study [27] show that Readers of the New York Times spend roughly the same amount of time on advertiser-sponsored posts as on news stories, suggesting that disclosure that a post is sponsored might affect negatively the exposure and consumption of that content. Not surprisingly, recent efforts have focused on the effects of disclosure-sponsored advertising and the effects of unethical disclosure practices [10, 18, 20]. Particularly, the results in [10] indicate that disclosure language containing text like “paid ad” affects the ad perception and recognition positively. Thus, this study suggests that brands can build better relationships with their consumers by clearly identifying to them the intent of sponsored content.

2.3 Measuring Sponsored Content

A few recent studies have focused on measuring the use of social platforms by sponsored content and influential marketing. Particularly, reference [9] shows that an increasing amount of sponsored content can negatively affect audience engagement with an influencer's profile and the authors propose a few ways to mitigate this negative effect. In [9, 19], the impact of sponsored content on user engagement, in combination with posting attributes and topic trends, is studied. The study in reference [15], uses Instagram sponsored content to build deep learning was used to build an influencer profiling model. Finally, Trevisan et al. [29] studied the activity of Instagram opinion leaders within different topics, such as politics, music, and sports, exploring interactions during political debates.

Despite the importance of the above existing studies, they are not dedicated to providing a better understanding of the evolution of this emergent advertising ecosystem. Thus, there is still a lack of understanding about the adoption and use of sponsored content on different social media platforms. Our effort here aims at filling this research gap.


In this work, we focused on advertisements made by Instagram and Facebook profiles, that is, sponsored posts created by the users themselves. Both platforms provide the option for users to declare a particular post as a sponsored-partnership and include the brand that sponsored it. However, some users still post sponsored content using other forms of disclosure, such as hashtag #ads.

Our dataset was collected using CrowdTangle, a tool provided by Facebook to a limited group of organizations, which aims to facilitate the analysis of public content from their social network platforms. The tool allows searching either for posts published on specified public Facebook pages or Instagram profiles, or posts that contain a specified text.

Our collection focused exclusively on disclosed sponsored posts and consisted of two steps. First, we searched for posts that contained at least one hashtag frequently used to disclose sponsorship (in English and in Portuguese). Table 1 shows the list of hashtags used in this first search step. This initial search returned 5,647,860 Facebook and 7,932,344 Instagram posts, published on 388,219 distinct Facebook pages and 484,550 Instagram profiles.

Second, for each Facebook page and Instagram profile that contained at least one post collected in the first step, we queried all posts of type sponsored-partnership published therein. This type of post has been gaining popularity and is becoming the preferred way to disclose sponsored content, especially on Instagram. Note that posts of type sponsored-partnership may or may not contain hashtags listed in Table 1. After the second search, we obtained a total of 17,217,603 sponsored posts (see Table 2).

Finally, we removed all pages with less than 1,000 followers, as well as posts with lacking attributes, and obtained a dataset with 4.2M Facebook posts from 172K pages and 5.2M Instagram posts from 286K profiles (see Table 2 b), spanning the time period from 01/01/2016 to 06/30/2021, thus including the pre and Covid-19 pandemic periods.

Table 1: Searched hashtags.
Language Hashtags
#sponsored, #ad, #spon, #advertisement,
#commercial, #paid, #promotion, #sp
#publi, #oferta, #desconto, #publicidade,
#pago, #promo, #promocao, #patrocinado,
#comercial, #anuncio, #propaganda
Table 2: Dataset volume.

CrowdTangle sends a file in CSV format with the requested historical data to the requester email. On both platforms, the following attributes are returned with each post: username and profile, number of profile followers or likes up to the time of posting, date and time of post creation, post type, the total number of interactions, number of likes, number of comments, number of views, post links, content data, such as post description and image text, post performance (measured through interactions) and, if it is a post flagged as an advertisement, the ad id, and sponsor's name is returned. For Facebook, the category (topic) of the post is also provided, as well as the page id, country, page description, interactions received in addition to likes, such as love and sad, the number of shares, size of the video, and whether a video was created by the user or shared by another user on the platform.

Data limitations: We would like to point out some of the limitations of our collect. False negatives were probably introduced due to the restricted set of hashtags used in the initial search, that is, there may be posts present in the database that do not qualify as sponsored content. In addition, hashtags were defined in Portuguese and English only. And finally, all analyzes present in the work were based on posts considered sponsored, with no comparisons with non-sponsored content.

Facebook page categories: Figure 1 shows the Cumulative Distribution Function (CDF) of the number of distinct Facebook pages in each topic category. We can see that less than 20% of a total of 2,399 categories encountered in our dataset have been used in at least 100 distinct pages. After removing categories with less than 1,000 pages and merging some with similar names, we selected a set of 25 categories (topics), listed in Table 5. The figure 2 shows the number of different users in each category. The category with the highest number of users is media news company and the one with the lowest is actor.

This subset of topics covers 82,670 Facebook pages in our dataset. Most of these pages had at most one category, 2667 pages had two topics, and only 4 pages had 3 topics. We analyze this set of categories according to pre and pandemic trends in Section 5.

Figure 1
Figure 1: Facebook: number of pages per category (before filtering, log ).
Figure 2
Figure 2: Facebook: number of users per category (after filtering).

Instagram hashtags: Hashtags are extensively used on Instagram. They are keywords used to describe the contents of a post. Users insert one or multiple tags in the body of the description of a picture, preceded by the # symbol. Instagram also gives its users the possibility of searching for hashtags, which yields the most recent and most popular posts with that specific hashtag. In total, our dataset contains 1,634,221 distinct hashtags. Since this is a free-text field, there is a lot of noise in this data. In fact, it has been estimated that typically less than 20% of hashtags in a post describe the actual content of the image they are attached to [3]. Figure 3 shows the CDF of the number of users that have used a particular hashtag. It is possible to see that 70% of all hashtags in our dataset have been used by a single user. To remove hashtags that are too specific or have irregular usage frequency, for the purpose of our analysis, we kept only those hashtags used by more than 100 users in the dataset. The hashtags used in the initial search of the posts, presented in the table 1, were also removed. This filtering resulted in a total of 3,880 ($0,28\%$) distinct hashtags.

Topic detection on Instagram: Topic modeling is a well-known problem in NLP. One of the most widely used topic modeling techniques is the Latent Dirichlet Allocation (LDA), proposed by Blei et al. [5], which is used to find hidden topics in documents. A topic might be a subject, like arts or education, that is discussed in the documents. The original setting in LDA, in which each word has a topic label, may not work well with short texts, such as Instagram posts or Twitter tweets. Further efforts have been made to detect topics in short texts, e.g., [3, 8, 12, 31].

To perform the analysis of topics on Instagram with LDA, the algorithm was executed a few times, modifying the number of topics to be defined and passing as input the hashtags used by more than 100 different users. It was then defined that the hashtags were better grouped into 15 topics, and for a hashtag to belong to a topic it must appear in at least 100 posts. Each topic was analyzed and associated with it a subject corresponding to the hashtags present in it. The table 3 presents the top 10 hashtags present in each topic and the chosen subject.

Figure 4 shows the size of each topic, by number of hashtags. It is possible to see that the biggest groups are related to photograph, style, and lifestyle, while the smallest ones are related to engagement, fashion, and shopping. Finally, we identified the set of posts belonging to each topic. We say that a post belongs to a topic if it has been assigned at least one hashtag from the set of hashtags comprising that topic. Note that there might be more than one topic per post. Of all distinct posts in our dataset, 1,040,381 were tagged with at least one topic. Table 2 b presents the number of posts left in our dataset after filtering by hashtag and topic usage. Section 6 presents the results obtained from the definition of topics with the LDA.

Notation: Let us define the set of Facebook pages or Instagram users, U = {u1, …, un} and posts, P = {p1, …, pm}, and the set of categories C = {C1, …, C25} on Facebook, and topics C = {C1, …, C15} on Instagram, where Ci ⊂ P and u(p) ∈ U is the page (or user) to which post pP belongs. Let Ci(M, Y)⊆Ci be the subset of posts in topic category Ci, published in month M and year Y, and inter(p) be the total number of interactions with post pP.

Metrics: To understand the dynamics between the different themes of the advertisements posted on social networks was designed as a metric to verify the occurrence of changes over time, mainly noticed with the COVID-19 pandemic. We compare two 13-month long time periods. One before the onset of the pandemic (from May 2018 to May 2019) and one afterward (from May 2020 to May 2021), leaving out a grace period of transition between the two (from June 2019 to April 2020). This period was chosen to exclude the first months of the covid-19 pandemic, where it momentarily affected the dynamics of social networks. This period also considers only once important holidays for online commerce such as Christmas, for example. We define the following metrics for the purpose of our comparative analysis of pre and post-pandemic topic trends ∀CiC:

\begin{eqnarray} pageRatio(C_i) &=& \frac{\left| \bigcup _{(5, 20) \le (M,Y)\le (5,21)}{\lbrace u(p) | p \in C_i(M,Y)\rbrace } \right|}{\left| \bigcup _{(5, 18) \le (M,Y)\le (5, 19)}{\lbrace u(p), p \in C_i(M,Y)\rbrace } \right|} \end{eqnarray}
\begin{eqnarray} interRatio(C_i) &=& \frac{\sum _{(5, 20) \le (M,Y)\le (5,21)}\sum _{p \in C_i(M,Y)}{inter(p)} }{\sum _{(5, 18) \le (M,Y)\le (5, 19)}\sum _{p \in C_i(M,Y)}{inter(p)} } \end{eqnarray}

Figure 3
Figure 3: Users per hashtag.
Figure 4
Figure 4: Hashtags per topic.
Table 3: Topics obtained with LDA.
TOPIC Top 10 hashtags per topic
Beauty beauty, makeup, hair, love, beautiful, cute, style, girls, nails, photooftheday
Sale diskon, sale, flashsale, infopromo, discount, msglow, mexico, infodiskon, miami, economia
Food food, foodie, foodporn, instafood, body, vegan, personalizados, bebe, mobile, maternidade
design, home, liketkit, decoracao, decor, homedecor, interiordesign, outlet, arquitetura, casa
Divulgation explorepage, giveaway, medan, viral, diskon, dirumahaja, explore, indonesia, jakarta, iklan
ootd, style, shopping, blackfriday, influencer, fashionblogger, travel, outfit, blogger, energydrink
fitness, canada, workout, gym, health, fit, healthylifestyle, indian, fitnessmotivation, motivation
moda, modafeminina, lookdodia, estilo, skincare, atacado, tendencia, liquidacao, lojaonline, look
Shopping sale, discount, love, murah, like, moda, venezuela, bogor, bekasi, gold
Engagement instagood, art, photooftheday, follow, picoftheday, tattoo, instalike, artist, free, like4like
photography, model, love, photooftheday, instagram, instagood, photoshoot, photo, partner, picoftheday
Brazil saopaulo, brasil, rj, repost, riodejaneiro, amor, brazil, bahia, love, bomdia
Digital Marketing
marketing, instagram, marketingdigital, youtube, dubai, influencer, branding, business, uae, socialmedia
fashionista, style, fashionblogger, model, fashionstyle, usa, fashionable, collaboration, nyc, beautiful
Music music, hiphop, rap, artist, producer, trap, newmusic, studio, beats, itunes

4 Sponsored content over time: Facebook × Instagram

In this section, we characterize our data by type, volume, and engagement over time, and perform a comparative analysis of Facebook × Instagram. For better visualization of temporal trends, we plotted the 7-day moving averages, i.e., the averages (e.g. of the number of posts) over the preceding seven days.

Figure 5 shows the temporal evolution of the daily number of posts collected from Facebook and Instagram. We note that the number of ads published on Instagram approximates and occasionally exceeds the number of ads on Facebook and starts to clearly exceed the latter in early 2021. Figure 6 focuses on the subset of sponsored- partnership posts. We can see that this type of posting is more prevalent on Instagram than on Facebook and is increasing in popularity on the former but not the latter platform. We observe a pattern of recurrent surges on both platforms near the end of each year, reflecting the usual marketing intensification around Black Friday and Christmas holidays. We can also observe a sudden drop in the number of ads in early 2020, which coincides with the onset of the Covid-19 pandemic, and quick recovery thereafter.

Figure 5
Figure 5: Total number of daily sponsored posts.
Figure 6
Figure 6: Number of daily sponsored-partnership posts.
Figure 7
Figure 7: Facebook: number of sponsored postsper country.
Figure 8
Figure 8: Number of daily newcomers (log ).

Figure 7 shows the five countries that had the highest number of sponsored posts published during the collected period. This analysis was carried out only on Facebook since this information was not available for Instagram. The United States is the country with the highest number of posts, followed by Brazil, where a clear growth pattern can be observed during the entire period. India remained in fifth place most of the time, while Thailand and the UK ranked third and fourth in the number of collected posts. It should be remembered that the hashtags used for the collection are in English and Portuguese, so this ratio is not a representative sample.

To analyze the emergence of new pages and profiles, Figure 8 shows the number of daily newcomers over time. It can be seen that the number of new profiles on Instagram exceeds the number of new Facebook pages during practically the entire period analyzed, and especially since early 2021.

Figures 9a and 9b show the distribution over time of the main types of posts on Facebook and Instagram, respectively. On both platforms, photo posts are the most frequent. On Facebook, the second most frequent type of ad is the link, followed by video ads. On Instagram, in mid-2018, ad posts with albums surpass those with videos. Finally, growing in popularity on Instagram is IGTV, a feature for displaying longer videos.

Figure 9
Figure 9: Number of sponsored posts by type.

Figure 10 shows the CDF of the number of sponsored posts per Facebook page (and Instagram profile) We can see that Instagram profiles contain a greater number of sponsored posts when compared to Facebook. While on Instagram about 75% of profiles contain at most 10 ads during the collected period, on Facebook about 82% of pages contain at most that same amount of ads.

Figure 10
Figure 10: Number of sponsored postsper profile (log ).
Figure 11
Figure 11: Engagement with each post(Def. (3), log ).

We define engagement with a post p, published on page u(p) as follows:

\begin{equation} engagement(p) = \frac{num\_interactions(p)}{num\_followers(u(p))}\times 100 \end{equation}

Figure 11 shows the CDF of engagement per post, as defined in (3). We observe that Instagram posts present higher engagement than Facebook posts. Whereas approx. 50% of Instagram posts have engagement greater than 1, only about 20% of Facebook posts achieve this value.

Figure 12
Figure 12: Number of interactions per post (log ).
Figure 13
Figure 13: Number of views per post (log ).
Figure 14
Figure 14: Number of followers per post (log ).
Figure 15
Figure 15: Growth of follower base per profile(Δ(u) × 100, log ).

Figure 12 shows the CDF of the number of interactions per post. Once again, we observe higher interaction with Instagram posts, compared to Facebook. Whereas less than 10% of Facebook posts received ≥ 1, 000 interactions, over 40% of Instagram posts received at least that many interactions. Figure 13 shows the CDF of the number of views of each post, revealing a similar distribution on both platforms. Circa 30% of posts on both platforms receive at least 10,000 views.

Figure 14 shows the CDF of the number of followers (of the profile) of each post at the time of posting. It can be seen that Instagram has a lower percentage (5%) of posts with ≥ 1, 000, 000 followers than Facebook (15%), possibly because Facebook is an older platform, with a longer history of online social networking.

Figure 15 shows the CDF of (approximate) follower base growth of each profile/page. Let p1(u) and plast(u) be the first and the last post published on profile u, we define growth Δ(u) as follows:

\begin{equation} \Delta (u) = \frac{num\_followers(p_{last}(u)) - num\_followers(p_1(u))}{num\_followers(p_1(u))} \end{equation}

Once again we observe higher growth of Instagram profiles, compared to Facebook pages. Whereas 20% of Instagram profiles more than doubled the size of their follower bases, on Facebook less than 10% of pages presented such a growth rate.


In this section, we focus on the categories (topics) information, available for Facebook pages. We look into topic trends before and after the onset of the Covid-19 pandemic. The following analysis is based on the subset of 82,670 Facebook pages, covered by the 25 topics (with at least 1,000 distinct pages), listed in Table 5.

In Table 5 we list the names of the 25 topic categories, in alphabetic order, along with the total numbers of pages and interactions, before (from May 2018 to May 2019) and after (from May 2020 to May 2021) the onset of the pandemic. Note that the pageRatio metric measures the relative growth of the supply side of influencer marking, in terms of the number of distinct pages in each category, whereas the interRatio metric addresses the demand side, in terms of the total number of interactions with the posts in each category.

In Table 4 a we list the top-3 and bottom-3 topic categories according to the pageRatio and interRatio metrics. In Figures 16a and 16b we show side-by-side the pre and post pandemic yearly totals of pages and interactions of each category, respectively.

Table 4: Top-3 and bottom-3 topic categories by pageRatio and interRatio, as defined in (1) and (2), resp.
Figure 16
Figure 16: Facebook: top-3 and bottom-3 topic categories by pageRatio and interRatio, as defined in (1) and (2), resp.

By analyzing the results according to metric pageRatio, we observe that the highest growth rates in terms of the yearly number of pages (content creators) belong to RETAIL (with an almost 10 × increase in the number of posts, 3 × in the number of pages, and 2 × in the number of interactions), CLOTHING (with an almost 3 × increase in number of posts, 2.5 × in number of pages, and 1.5 × in number of interactions), which is expected, considering the lockdowns of local shops all around the world. They are followed by the NON-PROFIT category (with an almost 2 × increase in the number of posts, 2.5 × in the number of pages, and 25% in the number of interactions), which might point to the increasing activity in social and political causes. The bottom-3 topics were INTEREST, ATHLETE, and PUBLIC-FIGURE.

By analyzing the results according to the interRatio metric, we observe that the highest growth rates in terms of the yearly number of interactions (engagement) belong to POLITICIAN (with an almost 5 × increase in the number of posts, 2 × in the number of pages, and 10 × in the number of interactions) and GOVERNMENT ORGANIZATION (with an almost 2 × increase in the number of posts, 2 × in the number of pages, and 2.5 × in the number of interactions), which again point to increasing interest and engagement in social and political causes. They are followed by the ARTIST category (with an almost 3 × increase in the number of interactions, but no increase on the supply side in terms of the number of posts and pages), which might point to an increasing engagement of artists with their audiences through the online social platforms during lockdowns and bans on group activities, such as concerts and exhibitions. The bottom-3 topics were MAGAZINE, RADIO, and MEDIA-NEWS.

Table 5: Topic trends: before (May 2018 to May 2019) and after (May 2020 to May 2021) Covid-19 pandemic onset.


In this section, we characterize the distribution of hashtags, topics, and topic trends on Instagram.

Table 6 lists the ten most popular hashtags, by number of users and number of posts. In Table 6 a, we can see that the most popular hashtags are related to fashion, beauty, and music. In fact, these topics also occur in the largest number of posts (Table 6 b). There are also some quite generic hashtags in this list, such as #love, #instagood, and #repost. These hashtags are extensively used together with others, possibly to boost a post's reach, since they are always popular.

Table 6: Top 10 hashtags.
Figure 17
Figure 17: Hashtags per post.
Figure 18
Figure 18: Topics per post.

Figure 17 presents the CDF of the number of hashtags per post. We can see that 80% of posts use up to 10 hashtags while only 10% use above 15 hashtags per post. In figure 18 shows the CDF number of topics per post. 40% of posts belong to only 1 topic, about 80% of posts are in up to 4 topics, and about 5% of posts belong to more than 7 topics.

In Table 4 b we list the top-3 and bottom-3 topics according to the pageRatio and interRatio metrics. In Figures 19a and 19b we show side-by-side the pre and post pandemic yearly totals of pages and interactions of each category, respectively.

With the pageRatio metric, the topic that had the greatest growth is ENGAGEMENT, which is also in second place in the interRatio rate. This topic includes hashtags used primarily to boost posts. SHOPPING is in second place, reinforcing the increase in online commerce. Lastly, the FASHION topic had the third biggest growth in the period. When analyzing the results of the interRatio metric, we observed that the highest growth rate in terms of the annual number of users belongs to hashtags related to BRAZILIAN CITIES with an increase close to 6 × in the number of users using the hashtag. Lastly, CLOTHING saw a 5 × increase in the number of interactions, a value already expected, considering the blockages of local stores around the world.

The 3 topics that had the lowest growth considering the two rates were PHOTOGRAPH, MUSIC, LIFESTYLE, STYLE, although no topic has decreased in this period on Instagram. The lower growth of these categories (PHOTOGRAPH, MUSIC and LIFESTYLE) is understandable given the pandemic moment where there were no face-to-face events, there were travel restrictions and people spent most of their time at home.

Figure 19
Figure 19: Instagram: top-3 and bottom-3 topics by pageRatio and interRatio, as defined in (1) and (2), resp.


In this work we collected and characterized a large-scale cross-platform dataset, comprised of 9.5 million sponsored posts, a tenfold increase compared to prior work. We performed a comparative analysis of influencer marketing evolution on Facebook and Instagram, spanning five years of posting activity, including the pre and post Covid-19 pandemic periods. We observed a period of instability lasting several months right at the beginning of the pandemic. We performed topic modeling with Latent Dirichlet Allocation (LDA) using the hashtags attached to sponsored Instagram posts and looked into the topic categories of sponsored Facebook pages. We analyzed the growth dynamics in terms of number of posts and user engagement in different topics of interest. In particular, we compared the volume of ads before and after the onset of the pandemic. On Facebook, for example, a greater number of posts related to non-profit organizations was observed after the onset of the pandemic, suggesting a greater interest in social causes. Overall, the growth of online commerce was visible on both social networks, and especially on Instagram, which presented a much faster growth of influencer marketing activity, compared to Facebook. A comprehensive model to explain associations between sponsored posting strategies and consumer brand engagement has yet to be uncovered. Social media analytics allow us to find hidden patterns in datasets of unprecedented volumes. By characterizing the dynamics of influencer marketing, we hope to provide support for the development of more contextualized and effective branding strategies.


This work was partially supported by research grants from CNPq, FAPEMIG, FAPESP, and CAPES.


  • Athanasios Andreou, Márcio Silva, Fabrício Benevenuto, Oana Goga, Patrick Loiseau, and Alan Mislove. 2019. Measuring the Facebook Advertising Ecosystem. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
  • Young Anna Argyris, Zuhui Wang, Yongsuk Kim, and Zhaozheng Yin. 2020. The effects of visual congruence on increasing consumers’ brand engagement: An empirical investigation of influencer marketing on instagram using deep-learning algorithms for automatic image classification. Computers in Human Behavior 112 (2020).
  • Argyris Argyrou, Stamatios Giannoulakis, and Nicolas Tsapatsoulis. 2018. Topic modelling on Instagram hashtags: An alternative way to Automatic Image Annotation?. In 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP). IEEE, 61–67.
  • Miqdad Asaria, Joan Costa i Font, and Frank Cowell. 2021. How Does Exposure to Covid-19 Influence Health and Income Inequality Aversion?Technical Report.
  • David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
  • Meeyoung Cha, Fabrício Benevenuto, Hamed Haddadi, and Krishna P. Gummadi. 2012. The world of connections and information flow in Twitter. IEEE Transactions on Systems, Man and Cybernetics - Part A 42, 4(2012), 991–998.
  • Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM). Washington DC, USA.
  • Qiuxing Chen, Lixiu Yao, and Jie Yang. 2016. Short text classification based on LDA topic model. In 2016 International Conference on Audio, Language and Image Processing (ICALIP). IEEE, 749–753.
  • Lucas Machado de Oliveira and Olga Goussevskaia. 2020. Sponsored Content and User Engagement Dynamics on Instagram. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. 1835–1842.
  • Nathaniel J. Evans, Joe Phua, Jay Lim, and Hyoyeun Jun. 2017. Disclosing Instagram Influencer Advertising: The Effects of Disclosure Language on Advertising Recognition, Attitudes, and Behavioral Intent. Journal of Interactive Advertising 17, 2 (7 2017), 109–123.
  • Saptarshi Ghosh, Naveen Sharma, Fabricio Benevenuto, Niloy Ganguly, and Krishna Gummadi. 2012. Cognos: Crowdsourcing Search for Topic Experts in Microblogs. In Proceedings of the Annual Int'l SIGIR Conference (SIGIR’12). Portland, USA.
  • Stamatios Giannoulakis and Nicolas Tsapatsoulis. 2019. Filtering Instagram Hashtags through crowdtagging and the HITS algorithm. IEEE Transactions on Computational Social Systems 6, 3 (2019), 592–603.
  • Daniel G Goldstein, R Preston McAfee, and Siddharth Suri. 2013. The cost of annoying ads. In Proceedings of the 22nd international conference on World Wide Web. 459–470.
  • Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Krause. 2012. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 4(2012), 1–37.
  • Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han, and Wei Wang. 2020. Multimodal Post Attentive Profiling for Influencer Marketing. In WWW ’20: The Web Conference 2020. 2878–2884.
  • Kristina Lerman and Rumi Ghosh. 2010. Information contagion: An empirical study of the spread of news on digg and twitter social networks. In Fourth international AAAI conference on weblogs and social media.
  • Chen Lou and Shupei Yuan. 2018. Influencer Marketing: How Message Value and Credibility Affect Consumer Trust of Branded Content on Social Media. Journal of Interactive Advertising 19 (10 2018), 1–45.
  • Emma Loude. 2017. #Sponsored?: Recognition of Influencer Marketing on Instagram and Effects of Unethical Disclosure Practices. Ph. D. Dissertation. University of Minnesota.
  • Lucas Machado de Oliveira and Olga Goussevskaia. 2020. Topic trends and user engagement on Instagram. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 488–495.
  • Charlie Pinder. 2017. The Anti-Influence Engine. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA ’17. ACM Press, New York, NY, USA, 770–781.
  • Daniel M Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A Huberman. 2011. Influence and passivity in social media. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 18–33.
  • Diego Saez-Trumper, Giovanni Comarela, Virgílio Almeida, Ricardo Baeza-Yates, and Fabrício Benevenuto. 2012. Finding Trendsetters in Information Networks. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
  • Noam Segev, Noam Avigdor, and Eytan Avigdor. 2018. Measuring Influence on Instagram: A Network-Oblivious Approach. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA). ACM, New York, NY, USA, 1009–1012.
  • Naveen Sharma, Saptarshi Ghosh, Fabricio Benevenuto, Niloy Ganguly, and Krishna Gummadi. 2012. Inferring Who-is-Who in the Twitter Social Network. In Proceedings of the ACM SIGCOMM Workshop on Online Social Networks (WOSN’12). Helsinki, Finland.
  • Márcio Silva, Lucas Santos de Oliveira, Athanasios Andreou, Pedro O. Vaz de Melo, Oana Goga, and Fabrício Benevenuto. 2020. Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook. In Proceedings of The Web Conference (WWW’20)(Taipei, Taiwan). ACM.
  • Till Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe Nunes Ribeiro, George Arvanitakis, Fabricio Benevenuto, Krishna P. Gummadi, Patrick Loiseau, and Alan Mislove. 2018. On the Potential for Discrimination in Online Targeted Advertising. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*’18).
  • Nathalie Tadena. 2014. NYT readers spend same amount of time on paid posts as news stories. Wall Street Journal (2014).
  • CrowdTangle Team. 2021. CrowdTangle. Facebook, Menlo Park, California, United States.
  • Martino Trevisan, Luca Vassio, Idilio Drago, Marco Mellia, Fabricio Murai, Flavio Figueiredo, Ana Paula Couto da Silva, and Jussara M. Almeida. 2019. Towards Understanding Political Interactions on Instagram. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (Hof, Germany) (HT ’19). Association for Computing Machinery, New York, NY, USA, 247–251.
  • Bartosz W Wojdynski and Nathaniel J Evans. 2016. Going native: Effects of disclosure position and language on the recognition and evaluation of online native advertising. Journal of Advertising 45, 2 (2016), 157–168.
  • Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. 2011. Comparing twitter and traditional media using topic models. In European conference on information retrieval. Springer, Springer Berlin Heidelberg, Berlin, Heidelberg, 338–349.




Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from

HT '22, June 28–July 01, 2022, Barcelona, Spain

© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9233-4/22/06…$15.00.