Automatic social media news classification: a topic modeling approach

Main Article Content

Abstract

Social media has modified the way that people access news and debate about public issues. Although access to a myriad of data sources can be considered an advantage, some new challenges have emerged, as issues about content legitimacy and veracity start to prevail among users. That transformation of the public sphere propels problematic situations, such as misinformation and fake news. To understand what type of information is being published, it is possible to categorize news automatically using computational tools. Thereby, this short paper presents a platform to retrieve and analyze news, along with promising results towards automatic news classification using a topic modeling approach, which should help audiences to identify news content easier and discusses possible routes to improve the situation in the near future.

Article Details

How to Cite
Daniel, Carlos, Ernesto, & Andrés. (2022). Automatic social media news classification: a topic modeling approach. Tecnología En Marcha Journal, 35(9), Pág. 4–13. https://doi.org/10.18845/tm.v35i9.6477
Section
Artículo científico

References

T. Highfield, “Social media and everyday politics”, Cambridge: Polity Press, 2016.

H. Margetts, P. John, S. Hale, & T. Yasseri, “Political turbulence: How social media shape collective action”, Princeton: Princeton University Press, 2016.

N. Newman, et al, “Reuters Institute Digital News Report 2019”, Oxford: Reuters Institute, 2019.

A. Marwick, “Why Do People Share Fake News? A Sociotechnical Model of Media Effects”, Georgetown Law Technology Review, 2(2), pp:474-512, 2018

S. Livingstone, “Tackling the Information Crisis: A Policy Framework for Media System Resilience”, Foreword. In LSE 2018. p: 2. London: LSE, 2018.

S. Waisbord, “Truth is what happens to news: On journalism, fake news, and post-truth. Journalism Studies”, 19(13), pp:1866–1878, 2018.

D. Blei, “Topic Modeling and Digital Humanities”. Journal of Digital Humanities, 2(1), pp: 8-11. 2021.

D. M. Blei, A. Y. Ng & M. I. Jordan, “Latent dirichlet allocation”. Journal of machine Learning research, 3(Jan), pp: 993-1022, 2003.

R. Řehůřek, & P. Sojka, “Gensim—statistical semantics in python” Retrieved from genism.org, [Accessed May. 2, 2011]

C. Soto-Rojas, C. Gamboa-Venegas, A. Céspedes-Vindas, “MediaTIC: A Social Media Analytics Framework For the Costa Rican News Media”. Tecnología en Marcha. Edición especial 2020. 6th Latin America High Performance Computing Conference (CARLA). pp: 18-24. 2020.

CrowdTangle Team. CrowdTangle. Facebook, Menlo Park, California, United States. [List ID: 1510711], 2020.

C. Sievert and K.Shirley. “LDAvis: A method for visualizing and interpreting topics”. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp: 63–70, Baltimore, Maryland, USA. Association for Computational Linguistics, 2014.

M. Röder, A. Both, and A. Hinneburg. 2015. “Exploring the Space of Topic Coherence Measures”. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM ‘15). Association for Computing Machinery, New York, NY, USA, pp: 399–408. 2015.

A. Gliozzo, “Semantic domains and linguistic theory”. In Proceedings of the LREC 2006 workshop Toward Computational Models of Literary Analysis, Genova, Italy. 2006.