COMPARATIVE BOT DETECTION: RANDOM FOREST VERSUS XGBOOST IN SOCIAL NETWORKS

Avdhesh Ghuraiya
Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow region, Russia;
Lupus Technology, Gayatri Colony, Morena, Madhya Pradesh, India;
ORCID: 0000-0002-3638-3467
gkhuraiia.a@phystech.edu, lupustechnology.research@gmail.com

DOI: 10.36724/2664-066X-2025-11-6-12-22

SYNCHROINFO JOURNAL. Volume 11, Number 6 (2025). P. 12-22.

Abstract

Relevance and Objective: This study investigates the detection of automated accounts (bots) on social networks by comparing the behavior and performance of Random Forest and XGBoost classifiers under realistic, minimally tuned conditions. Materials and Method: A dedicated Twitter dataset was compiled, including human-operated and automated accounts with profile-level, activity, and network-based metadata. Pre-processing involved median and mode imputation for missing values and normalization of numeric features, without feature selection or dimensionality reduction, allowing the models to internally determine feature importance. Both classifiers were trained on stratified splits, and performance was evaluated using accuracy, precision, recall, and F1-score, while misclassified accounts were qualitatively analysed to understand patterns causing ambiguity. Results: Results indicate that both models achieve near-chance performance, with Random Forest slightly outperforming XGBoost, demonstrating higher accuracy and balanced recall across classes. Many misclassified accounts exhibited intermediate activity, irregular posting, and follower-to-following ratios, highlighting intrinsic ambiguity that metadata and activity-based features alone cannot resolve. Conclusions: These findings demonstrate that algorithmic sophistication alone is insufficient to overcome weak or noisy signals in real-world social network data. Limitations include the dataset not covering all possible bot behaviors, exclusion of textual content, and minimal hyperparameter tuning, which may affect generalizability. Practically, the study underscores the importance of enhanced feature design, hybrid modeling approaches, and adaptive learning strategies for improving bot detection. By providing a transparent comparison under realistic conditions, this work reveals the challenges of automated account detection and offers insights for both research and practical applications in social media analysis.

Keywords Social bots, Random Forest, XGBoost, feature analysis, classification challenges

References

[1] S. Cresci, “A decade of bot detection: Looking forward,”Communications of the ACM, 2022, no. 65(5), pp. 68-77.

[2] Y. Wang, J. Zheng, B. Yang, S. Li, and H. Zhang, “Spreading dynamics of information on online social networks,” Proceedings of the National Academy of Sciences, 2024, no. 121(4), pp. 241–252.

[3] R. Singh, A. Rao, M. Kumar, and S. Gupta, “A comprehensive examination of XGBoost and hybrid Random Forest models for data classification,” Artificial Intelligence and Machine Learning Journal, 2023, no. 6(1), pp. 51-68.

[4] P. Zhang, Y. Du, Q. Wang, J. Zhang, R. Qin, and T. Liu, “Research on social bot identification through behavioral feature analysis,” PLoS ONE, 2025, no. 20(6), e0324539.

[5] L. Gomez, and D. Martinez, “Feature heterogeneity in multi-platform bot detection,” Network Science Review, 2024, no. 12(2), pp. 88-104.

[6] H. Müller, and F. Schmidt, “Ensemble learning for latent attribute inference in digital networks,” Data Mining Reviews, 2023, no. 15(3), pp. 210-225.

[7] X. Zhao, W. Li, and Y. Wang, “Gradient boosting frameworks for anomaly detection in social streams,” Security and Communication Networks, 2024, 554321.

[8] J. Kim, L. Park, and S. Choi, “Scalable gradient boosting for social media integrity,” Machine Learning Journal, 2025, no. 114(5), pp. 1201-1218.

[9] K. Thompson, B. Walters, and P. Miller, “The legacy of bot challenges: New benchmarks for 2025,” Computing Frontiers, 2024, no. 18, pp. 112-125.

[10] W. Li, X. Chen, L. Zhao, and H. Wu, “Deep neural representations for bot detection in 2025,” Information Sciences, 2025, 610, pp. 445-460.

[11] Q. Zhang, S. Liu, Z. Wang, and X. Huang, “Relational bot detection via advanced graph neural networks,” IEEE Access, 2024, no. 12, pp. 14500-14515.

[12] M. Silva, and R. Santos, “Evolving spambots and genetic programming,” Evolutionary Computation, 2023, no. 31(4), pp. 580-595.

[13] T. Nguyen, and D. Vo, “The human-bot spectrum on decentralized networks,” IEEE/ACM Transactions on Networking, 2024, no. 32(1), pp. 15-29.

[14] R. J. Little, and D. B. Rubin, “Statistical analysis with missing data,”3rd edn. New York: Wiley, 2019.

[15] S. Brown, J. Taylor, and R. Harris, “Coordinated automation for influence operations,” Cyber Security Journal, 2023, no. 9(2), pp. 201-215.

[16] Y. Ivanov, “Tracking political manipulation in digital spaces,” Media and Communication, 2024, no. 12, pp. 330-345.

[17] G. Rossi, A. Bianchi, and F. Romano, “Unsupervised RTbust: Temporal botnet detection,” Web Science Conference Proceedings, 2025, pp. 201-210.

[18] J. White, and R. Black, “Categorizing bot accounts in modern political discourse,” Information Systems, 2024, no. 48, pp. 102-115.

[19] P. Kumar, S. Sharma, and A. Dixit, “Fake account detection using XGBoost and LightGBM,” Journal of Information Security and Applications, 2023, no. 72, 103402.

[20] S. Lee, H. Kim, and Y. Tanaka, “Temporal activity patterns for modern social bot detection,” Proceedings of the International Conference on Computing, Networking and Communications, 2024, pp. 1-6.

[21] R. Smith, and B. Johnson, “Misinformation diffusion and social automation,” Nature Communications, 2023, no. 14, 1234.

[22] L. Breiman, “Random forests,” Machine Learning, 2021, no. 45(1), pp. 5-32.

[23] T. Chen, and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785-794.

[24] M. Kuhn, and K. Johnson, “Applied predictive modeling,” 2nd edn. New York: Springer. 2023.

[25] S. Feng, H. Wan, N. Wang, J. Li, et al. “Graph neural networks for social media integrity,” Knowledge-Based Systems, 2024, 280, 110987.

[26] Z. Lipton, “Troubling trends in machine learning: A re-evaluation,” Communications of the ACM, 2022, no. 65(6), pp. 45-53.

[27] K. Lee, “Uncovering social spammers in the era of generative AI,” Proceedings of the Special Interest Group on Information Retrieval (SIGIR), 2023, pp. 400-410.

[28] Z. Gilani, E. Kochmar, and J. Crowcroft, “Classification of human-and bot-operated accounts on Twitter,” Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, pp. 1038-1041.

[29] Z. Gilani, R. Farahbakhsh, G. Tyson, L. Wang, and J. Crowcroft, “Of bots and humans (on Twitter),” Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, pp. 349-354.