Handling Imbalanced Fraudulent Transaction Data Using SMOTE-Tomek and Random Forest: A Classification Approach

Authors

  • Mohamad Ilham University of PGRI Adi Buana Surabaya
  • Adi Winarno University of PGRI Adi Buana Surabaya
  • Moch Lutfi Yudharta University
  • Artanti Indrasetianingsih University of PGRI Adi Buana Surabaya

DOI:

https://doi.org/10.36456/best.vol7.no1.10335

Keywords:

Fraud Detection, Imbalanced Data, SMOTE-Tomek, Random Forest, Clasisification

Abstract

This research aims to address the class imbalance problem in fraud detection using hybrid resampling techniques, specifically SMOTE-Tomek, combined with Random Forest classifiers. Imbalanced data in fraud detection tasks can severely hinder model performance, resulting in poor detection of minority (fraud) cases. By employing SMOTE to oversample minority class instances and Tomek links to clean the borderline majority class samples, this study evaluates the effectiveness of this hybrid method in improving classification metrics. Using a benchmark credit card fraud dataset, we compare the performance of Random Forest models with and without the hybrid sampling approach. The experimental results show that SMOTE-Tomek significantly enhances recall and F1-score without sacrificing accuracy. This finding underscores the importance of using appropriate resampling strategies for improving model robustness in fraud detection.

Author Biographies

  • Mohamad Ilham, University of PGRI Adi Buana Surabaya

    Electrical Engineering

  • Adi Winarno, University of PGRI Adi Buana Surabaya

    Electrical Engineering

  • Moch Lutfi, Yudharta University

    Informatics Engineering

  • Artanti Indrasetianingsih, University of PGRI Adi Buana Surabaya

    Statistics

References

A. Sundaravadivel, A. Adithya, S. P. Soundararajan, and A. Gopal, “Optimizing Credit Card Fraud Detection with Random Forests and SMOTE,” International Journal of Engineering and Advanced Technology (IJEAT), vol. 8, no. 6, pp. 2174–2180, 2019.

S. Samant, P. Joshi, S. Bankar, S. Jain, and S. Ahuja, “SMOTE Based Credit Card Fraud Detection for Imbalanced Data: Performance Analysis,” 2024 International Technology Conference (OTCON) on Smart Computing for Industry 4.0, pp. 1–6, 2024. doi: 10.1109/OTCON60325.2024.10688312.

Q. Zou, J. Liu, Z. Shao, and Y. Wang, “A Credit Card Fraud Detection Method Based on Mahalanobis Distance Hybrid Sampling and Random Forest Algorithm,” Sensors, vol. 22, no. 7, pp. 1–18, 2022. doi: 10.3390/s22072627.

N. S. S. Pranavi et al., “Transaction Fraud Detection Using SMOTE Oversampling,” 2022 3rd International Conference for Emerging Technology (INCET), pp. 1–6, 2022. doi: 10.1109/INCET54531.2022.9824146.

O. A. Ogunleye and T. O. Akinola, “Real-Time Credit Card Fraud Detection and Reporting System Using Machine Learning,” 2022 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), pp. 46–51, 2022. doi: 10.1109/AIMS56470.2022.00017.

S. M. Marimuthu et al., “A Comparative Study of Sampling Techniques for Imbalanced Credit Card Fraud Detection,” 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA), pp. 121–127, 2023. doi: 10.1109/ICIDCA57996.2023.10198379.

P. Thakur and N. Joshi, “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” 2023 4th International Conference on Intelligent Engineering and Management (ICIEM), pp. 1–5, 2023. doi: 10.1109/ICIEM58232.2023.10169375.

M. Patel, B. Shah, and N. Dholakia, “Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data,” International Journal of Engineering Trends and Technology (IJETT), vol. 69, no. 5, pp. 46–52, 2021.

R. Latha and C. V. Priya, “A Machine Learning Approach for Credit Card Fraud Detection in Massive Datasets Using SMOTE and Random Sampling,” 2023 International Conference on Intelligent Sustainable Systems (ICISS), pp. 15–20, 2023. doi: 10.1109/ICISS57995.2023.10194925.

H. Zhou, C. Su, Y. Xie, and W. Zhang, “An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach,” Mathematics, vol. 10, no. 10, pp. 1–16, 2022. doi: 10.3390/math10101757.

G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, 2004.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. doi: 10.1023/A:1010933404324.

B. Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, vol. 41, no. 10, pp. 4915–4928, 2014.

X. Li, Y. He, and J. Zhu, “Class imbalance problem in credit card fraud detection: Based on SMOTE-Tomek Link method,” Journal of Physics: Conference Series, vol. 1213, no. 4, pp. 1–7, 2019.

I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769–772, 1976.

K. Raghuwanshi and A. Dutta, “Fraud detection in credit card transactions using machine learning algorithms,” Procedia Computer Science, vol. 167, pp. 2261–2270, 2020. doi: 10.1016/j.procs.2020.03.276.

N. Ahmed, A. Mahmood, and M. R. Islam, “A survey of machine learning techniques for credit card fraud detection,” Financial Innovation, vol. 9, no. 1, pp. 1–21, 2023.

M. Pathak and P. Srivastava, “Hybrid sampling and feature selection-based credit card fraud detection model,” International Journal of Information Technology, vol. 15, pp. 1691–1698, 2023.

Setiawan, Bramianto, Adi Winarno, and Vina Iasha. "Immersive Virtual Reality: Unlocking Students’ Elementary School Science Literacy." Science and Education 4.9 (2023): 281-288.

J. Brownlee, “SMOTE for Imbalanced Classification with Python,” Machine Learning Mastery, 2020. [Online]. Available: https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/

Setiawan, B., Winarno, A., & M.S, Z. (2024). Global Trend On Used Immersive Virtual Reality In Science Education: Bibliometric Analysis. DIKODA JURNAL PENDIDIKAN SEKOLAH DASAR, 5(2), 96-106. https://doi.org/10.37366/jpgsd.v5i2.5206

Downloads

Published

18-03-2025

Issue

Section

Contents of the Journal

How to Cite

Ilham, Mohamad, et al. “Handling Imbalanced Fraudulent Transaction Data Using SMOTE-Tomek and Random Forest: A Classification Approach”. Best : Journal of Applied Electrical, Science and Technology, vol. 7, no. 1, Mar. 2025, pp. 35-38, https://doi.org/10.36456/best.vol7.no1.10335.