Improving Balanced Accuracy  for Minority Plant Species  under Data Imbalance

Ruben Gonzalez-Villanueva; Jose Carranza-Rojas

doi:10.18845/tm.v37i7.7293

PDF

Published: Sep 9, 2024

DOI: https://doi.org/10.18845/tm.v37i7.7293

Keywords:

Imbalanced datasets, long-tail distribution, automatic plant identification, balanced metrics, deep learning, minority classes, classification.

Ruben Gonzalez-Villanueva

Costa Rica Institute of Technology

https://orcid.org/0000-0001-8044-3474

Jose Carranza-Rojas

Costa Rica Institute of Technology

https://orcid.org/0000-0002-9177-9173

Abstract

Regardless of the widely known success of deep learning in classification, such models are
commonly measured by metrics that do not account for data imbalance, especially in terms of
predictions per class, ignoring minority classes. This can be a problem, as minority classes are
often the most difficult to predict and collect data for. In the plant domain, for example, species
with fewer samples are often the ones that are hardest to collect and predict in the field. As
we continue to identify more and more plant species, more of them become minority species,
making it increasingly difficult to accurately classify them using traditional machine learning
methods. To address this issue, we explore the combination of traditional data and machine
learning approaches with deep learning techniques such as self-supervision in a preprocessing
stage. By using self-supervised training together with different sampling algorithms and class
weights, we were able to improve the balanced accuracy metric for minority plant species by
between 7.9% and 13% without affecting general accuracy. This shows that using deep learning
techniques in combination with traditional machine learning methods can help to improve the
accuracy of predictions for minority classes, even in domains where data is limited.

How to Cite

Gonzalez-Villanueva, R., & Carranza-Rojas, J. (2024). Improving Balanced Accuracy for Minority Plant Species under Data Imbalance. Tecnología En Marcha Journal, 37(7), Pág 22–27. https://doi.org/10.18845/tm.v37i7.7293

Issue

2024: Vol. 37, special issue. IEEE International Conference on BioInspired Processing

Section

Artículo científico

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.

References

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class

imbalance problem: Bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems,

Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463–484, 2012.

N. Bressler, “How to check the accuracy of your machine learning model,” Feb 2022. [Online]. Available:

https://deepchecks.com/how-to- check-the-accuracy-of-your-machine-learning-model/

Y. Pristyanto, I. Pratama, and A. F. Nugraha, “Data level approach for imbalanced class handling on educational data mining multiclass classification,” in 2018 International Conference on Information and Communications

Technology (ICOIACT), 2018, pp. 310–314.

S. Lu, F. Gao, C. Piao, and Y. Ma, “Dynamic weighted cross entropy for semantic segmentation with extremely

imbalanced data,” in 2019 Interna- tional Conference on Artificial Intelligence and Advanced Manufacturing

(AIAM), 2019, pp. 230–233.

J. Carranza-Rojas and E. Mata-Montero, “Combining leaf shape and texture for costa rican plant species identification,” CLEI Electronic journal, vol. 19, no. 1, pp. 7–7, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE

conference on computer vision and pattern recognition, 2016, pp. 770–778.

K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, “The balanced accuracy and its posterior

distribution,” in 2010 20th International Conference on Pattern Recognition, 2010, pp. 3121–3124.

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame- work for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.

G. King and L. Zeng, “Logistic regression in rare events data,” Political analysis, vol. 9, no. 2, pp. 137–163,

Article Sidebar

Main Article Content

Abstract

Article Details

References