It has become apparent that human accounts are not the sole actors in the social media scenario. The expanding role of social media in the consumption and diffusion of information has been accompanied by attempts to influence public opinion. Researchers reported several instances where social bots, automated accounts designed to impersonate humans, have been deployed for this purpose. More recently, platforms such as Twitter provided evidence pointing to governments creating and using fake accounts in this kind of abuse. These accounts are known as state-backed trolls. Although these different actors have been widely studied, there is little understanding of how they differ when examined together. In this paper, we contribute to understanding the characteristics of the different types of accounts and increase our awareness of Twitter’s state-backed trolls, which so far have received limited attention from quantitative researchers. We propose a large-scale quantitative analysis, which relies on both datasets released by Twitter and researchers in recent years to characterize the different actors that take part in the social network scenario. In particular, we represent each account with a large number of features categorized into three distinct traits: credibility, initiative, and adaptability concerning the underlying aspects into which they best fit. We conducted subsequent experiments, isolating features on their respective traits and using them all. First, we apply dimensionality reduction to project accounts onto the same bi-dimensional space and visualize how they distribute across it. Then, we experiment with different combinations of two parameters that affect the dimensionality reduction and clustering algorithm to find which trait is best suited to distinguish the different actors. In our best combination in terms of effectiveness, we obtain high-quality clusters, achieving a purity score of 0.9, which results in homogeneous clusters where accounts of the same category are grouped. Beyond that, we explore our results by visualizing and studying clustering results to determine the differences between account categories. Using our defined traits, we show that it is possible to distinguish the different accounts through clustering, obtaining the best results while leveraging the three traits simultaneously. An additional analysis shows that features related to retweeting patterns and URLs sharing are effective in differentiating trolls and humans. At the same time, social bot accounts suffer from recall degradation in cross-domain evaluation. Moreover, we show that accounts belonging to the same dataset are not necessarily similar in the defined traits. Finally, we perform a feature importance analysis using SHAP to gain insights into which features best differentiate the account when examined in pairs.
File: link