TY - GEN
T1 - Hate Speech Detection During the 2023 Chilean Plebiscite Constitutional Reform
AU - Paredes, Jimmy
AU - Cuenca, Erick
AU - Coloma, Claudio
AU - Grimaldi, Daniel
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025
Y1 - 2025
N2 - This work presents a classification of hate speech using Natural Language Processing approaches, including collecting and labeling the data, text augmentation using the back-translation technique to address the imbalanced class problem, and data preprocessing. This led to the creation of a model capable of classifying hate in tweets in Spanish from platform X in the context of the 2023 Chilean Constitutional Plebiscite. Results show that approaches based on Convolutional Neural Networks (CNNs) in 1 dimension obtained better results than approaches based on Machine Learning because CNNs can identify patterns and relations between consecutive words, making them more accurate in understanding the context of the tweet. The CNN model achieved an overall accuracy of 86% on the testing dataset, while Machine Learning approaches achieved between 79% and 81% on the testing dataset. It is essential to consider that since the dataset presents an imbalance in the classes, other metrics were also presented, such as precision, F1-score, and recall, where, once again, the best results were obtained using CNN.
AB - This work presents a classification of hate speech using Natural Language Processing approaches, including collecting and labeling the data, text augmentation using the back-translation technique to address the imbalanced class problem, and data preprocessing. This led to the creation of a model capable of classifying hate in tweets in Spanish from platform X in the context of the 2023 Chilean Constitutional Plebiscite. Results show that approaches based on Convolutional Neural Networks (CNNs) in 1 dimension obtained better results than approaches based on Machine Learning because CNNs can identify patterns and relations between consecutive words, making them more accurate in understanding the context of the tweet. The CNN model achieved an overall accuracy of 86% on the testing dataset, while Machine Learning approaches achieved between 79% and 81% on the testing dataset. It is essential to consider that since the dataset presents an imbalance in the classes, other metrics were also presented, such as precision, F1-score, and recall, where, once again, the best results were obtained using CNN.
KW - Chilean Constitutional Plebiscite
KW - CNNs
KW - Hate speech
KW - imbalance
KW - tweets
UR - http://www.scopus.com/inward/record.url?scp=105004638330&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-87065-1_5
DO - 10.1007/978-3-031-87065-1_5
M3 - Conference contribution
AN - SCOPUS:105004638330
SN - 9783031870644
T3 - Lecture Notes in Networks and Systems
SP - 46
EP - 55
BT - Systems, Smart Technologies, and Innovation for Society - Proceedings of CITIS 2024
A2 - Inga Ortega, Esteban Mauricio
A2 - Robles-Bykbaev, Vladimir Espartaco
A2 - García Herranz, Nuria
A2 - Gallego Diaz, Eduardo
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th International Conference on Science, Technology and Innovation for Society, CITIS 2024
Y2 - 18 July 2024 through 19 July 2024
ER -