Proposal of self and semi-supervised  learning for imbalanced classification  of coronary heart disease tabular data

Danny Xie-Li; Manfred González-Hernández

doi:10.18845/tm.v37i7.7295

PDF

Published: Sep 9, 2024

DOI: https://doi.org/10.18845/tm.v37i7.7295

Keywords:

Self-supervised learning, semi-supervised learning, data augmentation, contrastive learning, imbalanced, medical datasets

Danny Xie-Li

Instituto Tecnológico de Costa Rica

https://orcid.org/0000-0003-1878-9460

Manfred González-Hernández

Universidad de Costa Rica. Costa Rica

https://orcid.org/0000-0002-5408-7901

Abstract

Triple Mixup is an augmentation policy in the hidden latent space we introduced in the Contrastive
Mixup Self-Semi Supervised learning framework, to address the imbalanced data problem, for
Cardiovascular Heart Diseases tabular dataset. Medical tabular datasets are known to present
challenges as high imbalanced class, limited annotated quality samples due to the domain
nature. Recent literature in Self and Semi supervised learning, has shown tremendous progress
in learning useful representations, and leveraging unlabeled dataset and labeled dataset to
train a learning model. Most existing methods are not feasible for tabular data due to the data
augmentation scheme. In addition, the high imbalanced problem can show lower performance
on machine learning algorithms. For this work, we propose the triple data augmentation method
in hidden space to attack the unbalanced challenge in self-supervised and semi-supervised
learning, from the possible applications of Contrastive Mixup, thus we will study the influence of it.

How to Cite

Xie-Li, D., & González-Hernández, M. (2024). Proposal of self and semi-supervised learning for imbalanced classification of coronary heart disease tabular data. Tecnología En Marcha Journal, 37(7), Pág 38–43. https://doi.org/10.18845/tm.v37i7.7295

Issue

2024: Vol. 37, special issue. IEEE International Conference on BioInspired Processing

Section

Artículo científico

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.

References

M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proceedings of the

IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 778–10 787.

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” ´ IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020.

A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,”

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region

Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp.

–1149, 2017.

Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,” IEEE Transactions

on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.

D. Snow, “DeltaPy: A Framework for Tabular Data Augmentation in Python,” SSRN Electronic Journal, pp. 1–3,

B. Sathianarayanan, Y. C. Singh Samant, P. S. Conjeepuram Guruprasad, V. B. Hariharan, and N. D. Manickam,

“Feature-based augmentation and classification for tabular data,” CAAI Transactions on Intelligence

Technology, vol. 7, no. 3, pp. 481–491, 2022.

G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks

for tabular data via row attention and contrastive pre-training,” 6 2021. [Online]. Available: http://arxiv.org/

abs/2106.01342

J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, “Vime: Extending the success of self-and semi-supervised learning to tabular domain,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 033–11

, 2020.

M. Hyun, J. Jeong, and N. Kwak, “Class-imbalanced semi-supervised learning,” 2 2020. [Online]. Available:

http://arxiv.org/abs/2002.06815

S. Darabi, S. Fazeli, A. Pazoki, S. Sankararaman, and M. Sarrafzadeh, “Contrastive mixup: Self- and semisupervised learning for tabular domain,” 2021. [Online]. Available: http://arxiv.org/abs/2108.12296

X. Li, L. Khan, M. Zamani, S. Wickramasuriya, K. W. Hamlen, and B. Thuraisingham, “Mcom: A semi-supervised

method for imbalanced tabular security data” in IFIP Annual Conference on Data and Applications Security and

Privacy. Springer, 2022, pp. 48–67.

A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised

learning,” 10 2020. [Online]. Available: http://arxiv.org/abs/2011.00362

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” 2 2020. [Online]. Available: http://arxiv.org/abs/2002.05709

P. M. Tripathi, A. Kumar, R. Komaragiri, and M. Kumar, A Review on Computational Methods for Denoising and

Detecting ECG Signals to Detect Cardiovascular Diseases. Springer Netherlands, 2022, vol. 29, no. 3. [Online].

Available: https://doi.org/10.1007/s11831-021-09642-2

A. Subas, E. Alickovic, and J. Kevric, “Diagnosis of chronic kidney disease by using random forest,” IFMBE

Proceedings, vol. 62, no. 3, pp. 589–594, 2017.

W. Deng, Z. Huang, J. Zhang, and J. Xu, “A Data Mining Based System for Transaction Fraud Detection,”

IEEE International Conference on Consumer Electronics and Computer Engineering, ICCECE 2021,pp.

–545, 2021.

D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, “Prediction of coronary heart disease using

supervised machine learning algorithms,” IEEE Region 10 Annual International Conference, Proceedings/

TENCON, vol. 2019-Octob, pp. 367–372, 2019.

H. Yang, “Coronary heart disease historical data,” 2022. [Online]. Available: https://dx.doi.org/10.21227/eapxt883

T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual

representations,” CoRR, vol. abs/2002.05709, 2020. [Online]. Available: https://arxiv.org/abs/2002.05709

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation

learning,” 2019. [Online]. Available: https://arxiv.org/abs/1911.05722

Article Sidebar

Main Article Content

Abstract

Article Details

References