A Look at Weibo / WeChat as Musk Re-Ignites Spam Bot Debates

Elon Musk’s mutii-billion dollar fight for Twitter has taken several unprecedented turns in recent weeks, the latest development being the deal has been put on hold due the dispute of the percentage of spam bot that is disclosed by Twitter on its own reporting. The reported figured, which was “well under 5%” by Twitter CEO Parag Agrawal, was heavily challenged by Elon Musk. Musk also happens to be one of the most prolific user of the platform, boasting a figure north of 93m followers. He’s also the one about to splash $44 billion dollars to realize his vision of a digital townhall with over 229 million monetizable daily active users (mDAU). Whether the percentage of ‘spam bots’ is 5 or 5.01%, it has become a multi-million dollar question as Musk is spending over $192 per user in this acquisition.

As the debate rages on, one would easily come across spam bots on any casual scroll on Twitter. The bots are deployed from everything including crypto scams to disinformation campaigns by the likes of Russia and China. While the debate is not new and is unlikely to be the last time the issue makes the headlines, the question of fake vs. real users has real material impact in not just the user experience of these services, but also on advertising dollars and ultimately how much are these companies’ are worth.

In late 2021, Craft Associates together with Han Zhang from the Hong Kong University of Science and Technology (HKUST), together we looked at the prevalence of bots on China’s Sina Weibo and WeChat and their impact on content engagement. Our study encompassed 576 Weibo and WeChat accounts from online media and bloggers in the tech space and their accompanying 17000+ posts on TMT topics such as 5G and Smart Devices. Using statistical techniques, we conclude that the post engagement did not follow a normal distribution pattern amongst our dataset of WeChat accounts in our study. Similarly on Weibo, where the presence of bots also known as ‘water army’ are prevalent and the use of ‘retweet and win’ are a common tactic to boost promotion, these suspected network effects did have an impact on post engagement within our subset of data.

The date range of the dataset spans from August 2020 to August 2021.

WeChat Content Analysis

The distribution of engagement of each account is calculated to assess the quality of accounts. The following analysis studies 74 WeChat accounts that have more than 30 posts. Using the Shapiro normality test, (or we can simply visualize the data distribution), we can safely say that barely any account has a normally distributed engagement. Heavy tails, skewness and large extreme values are presented in almost every account. Below are some commonly exhibited characteristics in their distributions.

 

1) Right-skewness

The heavy right-skewness indicates that most of its posts have an engagement below mean. However, some posts do perform exceptionally, reaching over 30,000 engagements.

By calculating the skewness, we found that, out of the 74 accounts, 72 are right-skewed.



 

2) Inverse Gaussian distribution

Inverse Gaussian distribution is commonly used to model non-negative, left-skewed and long-tail data, which makes it good modelling for our WeChat post data. A previous study had found that FB social media data closely follows Inverse Gaussian distribution, therefore, we also employ Inverse Gaussian distribution to study our data.

Using the Shapiro-Wilk test, we found that, out of the 74 accounts tested, 35 follows the Inverse Gaussian distribution. And of those who follow this distribution, the mean engagement is around 11,000. This indicates that at least half of the posts are concentrated towards the lower end of the engagement, example shown in the graph on the left.

 

3) Large Outliers

Some accounts do exhibit large outliers, like the example on the right, meaning that a number of posts may have unexpected large engagement.

Using IQR method to detect outliers, we found that ALL accounts have outliers. The number of posts with exceptionally high engagement accounts for 25% of all posts, that is, we can expect that nearly 1/4 of posts will exhibits unexpected good performance.


Weibo Content Analysis

To estimate whether bots and followers are present in an account, we selected the ratio of engagement to followers as an indicator. We postulate if the engagement ratio is high, then it is more likely that the account is healthy and follows a more natural distribution of low to high engagement based on quality of content or topic. On the contrary, if an account has many followers, but its engagement is consistently low, then it is likely that bots and low-quality followers take up much of its followers. Below are our observations:

Example 1

The blogger afeishuo’s engagement/follower ratio is ~0.17, the highest among all accounts. Most of its posts have an engagement of over 1,500, indicating that this account has a fixed number of real followers.

 

Example 2

This account belongs to the cnBeta Tech media site has 1.2 million+ followers. However, all of its posts’ engagement is below 10, implying that some of its followers are likely to be bots or a rather ineffective channel for this publisher to reach its audience to say the least

 

Additional Observations

In order to better understand what makes the engagement of some posts exceptionally high, several hypothesis is made.

Retweeting the post will win the follower a prize.

for example,

“... 转发关注点赞,抽1个小伙伴送蔚来NIODay伴手礼1份(盲盒1/玩偶1/火锅底料*1)...”

this post gained the account 5000+ engagement, which is much higher than its ordinary level of ~1000 engagement.

The regression analysis which includes whether the post contains lottery as a factor shows that, having a lottery in the post will have a positive effect on the post’s engagement.

Mentioning celebrities.

for example,

“...#数码资讯# 荣耀50系列与华为P50系列外观一脉相通,唯一差别就是后置模组。今天荣耀手机公布#荣耀50# 系列代言人--龚俊。...”

this post gained the account 1600+ engagement, which is much higher than its ordinary level of a few hundred engagement.

However, more analysis needs to be done to prove that mentioning celebrities will have such a positive effect.


Implications

From our study, engagement of WeChat media accounts consist of either a long-tail, where majority of the posts are low performing with a few outliers, or those that follow closer to the Pareto Principle, where 80% of the engagement originate from 20% of content. Upon a closer observation, the 20% content tends to be exclusive stories or breaking news, such as a scandal within a tech firm, rather than branded content like a product launch. Given these extremes, evaluating an account’s performance by its median may yield a more accurate picture of the overall engagement performance of an account.

Similar to Twitter, Weibo is more impacted by network effects such as mentioning of celebrity or incentives for sharing, both of which can drive large spikes to engagement in a short time. As expected, we noticed those effects are short-lived and unlikely to be sustained. Another observation we found was that accounts boasting millions of followers seemingly have engagement far below what is expected. A sign that the followers are inactive or artificial, and are only activated to win prizes or to support a celebrity.


Methodology

We analyzed social media posts by technology media accounts between August 2020 to August 2021. The number of accounts totaled 576 and included over 17,000 posts during this period.

For our data analysis, we used R and Python to conduct statistical analysis across the metadata of the posts to find the causal relationship between post-performance and factors such as post timing.

Different regression analysis is deployed to evaluate the whether a relationship exists between post variables.

OLS

The OLS regression is used to estimate the marginal effect of brand, weekday, account type and number of followers

For Weibo posts, the measure of engagement could be the number of comments, likes, and retweets, or the sum of those measures.

For WeChat posts, the measure of engagement could be the number of clicks, likes or the sum of the two measures.

Binomial regression

Noticeably, 100k click is a benchmark to indicate whether a WeChat post is popular enough. Reaching 100k clicks is a strong signal that this post is spread widely enough. To capture this effect, binomial models, such as logistic regression, are employed to estimate the likelihood to reach 100k clicks.

The OLS and binomial model combined yields a comprehensive analysis of social media performance, regarding different measurements of account performance.

Next
Next

Automotive Trends - Chinese Car Buyer’s Color Preferences