Improving Balanced Accuracy for Minority Plant Species under Data Imbalance

Main Article Content

Ruben Gonzalez-Villanueva
Jose Carranza-Rojas

Abstract

Regardless of the widely known success of deep learning in classification, such models are
commonly measured by metrics that do not account for data imbalance, especially in terms of
predictions per class, ignoring minority classes. This can be a problem, as minority classes are
often the most difficult to predict and collect data for. In the plant domain, for example, species
with fewer samples are often the ones that are hardest to collect and predict in the field. As
we continue to identify more and more plant species, more of them become minority species,
making it increasingly difficult to accurately classify them using traditional machine learning
methods. To address this issue, we explore the combination of traditional data and machine
learning approaches with deep learning techniques such as self-supervision in a preprocessing
stage. By using self-supervised training together with different sampling algorithms and class
weights, we were able to improve the balanced accuracy metric for minority plant species by
between 7.9% and 13% without affecting general accuracy. This shows that using deep learning
techniques in combination with traditional machine learning methods can help to improve the
accuracy of predictions for minority classes, even in domains where data is limited.

Article Details

How to Cite
Gonzalez-Villanueva, R., & Carranza-Rojas, J. (2024). Improving Balanced Accuracy for Minority Plant Species under Data Imbalance. Tecnología En Marcha Journal, 37(7), Pág 22–27. https://doi.org/10.18845/tm.v37i7.7293
Section
Artículo científico

References

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class

imbalance problem: Bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems,

Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463–484, 2012.

N. Bressler, “How to check the accuracy of your machine learning model,” Feb 2022. [Online]. Available:

https://deepchecks.com/how-to- check-the-accuracy-of-your-machine-learning-model/

Y. Pristyanto, I. Pratama, and A. F. Nugraha, “Data level approach for imbalanced class handling on educational data mining multiclass classification,” in 2018 International Conference on Information and Communications

Technology (ICOIACT), 2018, pp. 310–314.

S. Lu, F. Gao, C. Piao, and Y. Ma, “Dynamic weighted cross entropy for semantic segmentation with extremely

imbalanced data,” in 2019 Interna- tional Conference on Artificial Intelligence and Advanced Manufacturing

(AIAM), 2019, pp. 230–233.

J. Carranza-Rojas and E. Mata-Montero, “Combining leaf shape and texture for costa rican plant species identification,” CLEI Electronic journal, vol. 19, no. 1, pp. 7–7, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE

conference on computer vision and pattern recognition, 2016, pp. 770–778.

K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, “The balanced accuracy and its posterior

distribution,” in 2010 20th International Conference on Pattern Recognition, 2010, pp. 3121–3124.

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame- work for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.

G. King and L. Zeng, “Logistic regression in rare events data,” Political analysis, vol. 9, no. 2, pp. 137–163,