Text Mining for Classifying Potentially Depressive Tweets on X Using IndoBERT
DOI:
https://doi.org/10.36456/jstat.vol18.no2.a10873Keywords:
Depression, IndoBERT, Classification, X Social Media, Text MiningAbstract
Depression is a severe issue in Indonesia, where sufferers often do not seek professional help and prefer to express themselves through social media such as the X. This study uses a text mining approach to classify potential depression using a dataset of 5,000 Indonesian-language tweets from October 2024 to January 2025. The preprocessing steps involves case folding, cleaning, normalization, and stopword removal. The dataset was labeled into two classes: potentially depressive and normal, then divided into 80% training data and 20% test data. A pre-trained IndoBERT model was adjusted with a learning rate of 2e-05, batch size of 8, and epoch of 2 for this depression potential classification task. The evaluation results showed that the IndoBERT model performed well with an accuracy of 87%, precision of 87%, recall of 87%, and f1 score of 87%. However, the model’s performance affected by class imbalance, so it tended to be better at predicting the majority label (normal) than the minority label (depression). Therefore, rebalancing is recommended to prevent similar occurrences. The IndoBERT model used in this study was initialized from an emotion classification model, manual labelling was conducted by researchers in collaboratoin with psychiatrists to ensure clinical relevance. Finally, the trained model was deployed into a web-based application using Streamlit. This application was created as a preliminary screening tool to assist psychiatrists, not as a diagnostic system.
References
[1] Kemenkes RI, “Cegah Bunuh Diri, Kemenkes Ajak Remaja Bicara Soal Kesehatan Mental,” 2024. https://sehatnegeriku.kemkes.go.id/baca/umum/20240917/2446492/cegah-bunuh-diri-kemenkes-ajak-remaja-bicara-soal-kesehatan-mental/ (accessed Nov. 01, 2024).
[2] Y. A. Beo et al., Ilmu Keperawatan Jiwa dan Komunitas. PENERBIT MEDIA SAINS INDONESIA, 2022.
[3] S. Aloysius and N. Salvia, “Analisis Kesehatan Mental Mahasiswa Perguruan Tinggi X Pada Awal Terjangkitnya Covid-19 di Indonesia,” J. Citizsh. Virtues, vol. 1, no. 2, pp. 83–97, 2021, doi: 10.37640/jcv.v1i2.962.
[4] GoodStats Data, “Angka Kasus Bunuh Diri di Indonesia Meningkat 60% dalam 5 Tahun Terakhir,” 2024. https://data.goodstats.id/statistic/angka-kasus-bunuh-diri-di-indonesia-meningkat-60-dalam-5-tahun-terakhir-2FzH6 (accessed May 05, 2025).
[5] A. Muhawarman, “Memutus Rantai Stigma Kesehatan Jiwa,” Kemenkes. 2024.
[6] Z. Maritska, A. B. Prananjaya, S. P. Nabila, and N. Parisa, “Promosi Kesehatan Jiwa Berbasis Media Sosial (Instagram Live) bagi Masyarakat di Masa Pandemi COVID-19,” Wal’afiat Hosp. J., vol. 04, no. 01, pp. 13–22, 2023.
[7] E. Safitri, W. A. Syukrilla, and I. N. L. Fitriana, “Logistic Regression for Sentiment Analysis of Insecurity Phenomena on Platform X,” J Statistika, vol. 18, no. 1, pp. 948–956, 2025.
[8] D. K. A. Astutik, A. Indrasetianingsih, and F. Fitriani, “Penerapan Text Mining pada Analisis Sentimen Pengguna Twitter Layanan Transportasi Online Menggunakan Metode Density Based Spatial Clustering of Applications with Noise (DBSCAN) dan K-Means,” J Statistika, vol. 15, no. 1, pp. 184–194, 2022.
[9] X. Liu et al., “Emotion classification for short texts: an improved multi-label method,” Humanit. Soc. Sci. Commun., vol. 10, no. 1, pp. 1–9, 2023, doi: 10.1057/s41599-023-01816-6.
[10] G. F. Situmorang and Purba, “Deteksi Potensi Depresi dari Unggahan Media Sosial X Menggunakan Teknik NLP dan Model IndoBERT,” Build. Informatics, Technol. Sci., vol. 6, no. 2, pp. 649–661, 2024, doi: 10.47065/bits.v6i2.5496.
[11] I. Ameer, M. Arif, G. Sidorov, H. Gòmez-Adorno, and A. Gelbukh, “Mental Illness Classification on Social Media Texts using Deep Learning and Transfer Learning,” arXiv Prepr. arXiv2207.01012, 2022.
[12] B. Kholifah, I. Syarif, and T. Badriyah, “Mental disorder detection via social media mining using deep learning,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 5, no. 4, pp. 3019–316, 2020, doi: https://doi.org/10.22219/kinetik.v5i4.1120.
[13] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” COLING 2020 - 28th Int. Conf. Comput. Linguist. Proc. Conf., pp. 757–770, 2020, doi: 10.18653/v1/2020.coling-main.66.
[14] J. de J. Titla-Tlatelpa, R. M. Ortega-Mendoza, M. Montes-y-Gómez, and L. Villaseñor-Pineda, “A profile-based sentiment-aware approach for depression detection in social media,” EPJ Data Sci., vol. 10, no. 1, 2021, doi: 10.1140/epjds/s13688-021-00309-3.
[15] F. Alhamed, R. Bendayan, J. Ive, and L. Specia, “Monitoring Depression Severity and Symptoms in User-Generated Content: An Annotation Scheme and Guidelines,” Proc. 14th Work. Comput. Approaches to Subj. Sentim. Soc. Media Anal., pp. 227–233, 2024, [Online]. Available: https://aclanthology.org/2024.wassa-1.18
[16] W. A. Hidayat and V. R. S. Nastiti, “Perbandingan Kinerja Pre-trained IndoBERT-Base dan IndoBERT-Lite pada Klasifikasi Sentimen Ulasan TikTok Tokopedia Seller Center dengan Model IndoBERT,” J. Sist. Inf., vol. 11, no. 2, pp. 13–20, 2024, doi: 10.30656/jsii.v11i2.9168.
[17] X. Luo, H. Ding, M. Tang, P. Gandhi, Z. Zhang, and Z. He, “Attention Mechanism with BERT for Content Annotation and Categorization of Pregnancy-Related Questions on a Community QA Site,” Proc. - 2020 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2020, pp. 1077–1081, 2020, doi: 10.1109/BIBM49941.2020.9313379.
[18] K. Zeberga, M. Attique, B. Shah, F. Ali, Y. Z. Jembre, and T.-S. Chung, “A Novel Text Mining Approach for Mental Health Prediction Using Bi-LSTM and BERT Model,” Comput. Intell. Neurosci., vol. 2022, 2023, doi: 10.1155/2022/7893775.
[19] G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.
[20] T. Oswari, M. Murniyati, T. Yusnitasari, N. Nurasiah, and S. Wijay, “Sentiment Analysis of Indonesian Youtube Reviews About Lesbian, Guy, Bisexual and Transgender (LGBT) using IndoBERT Fine Tuning,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 15, no. 1, p. 26, 2024, doi: 10.24843//lkjiti.2024.v15.i01.p03.
[21] L. H. Lai et al., “The Use of Machine Learning Models with Optuna in Disease Prediction,” Electron., vol. 13, no. 23, pp. 1–20, 2024, doi: 10.3390/electronics13234775.
[22] H. Imaduddin, F. Y. A, and Y. S. Nugroho, “Sentiment Analysis in Indonesian Healthcare Applications using IndoBERT Approach,” vol. 14, no. 8, pp. 113–117, 2023.
[23] S. P. Revathy, M. Sindhuja, and R. Jayashree, “Streamlit-based Web Application for Parkinson ’ s Detection using Machine Learning,” no. January, 2025, doi: 10.36548/jaicn.2024.4.006.







