Proposal of self and semi-supervised learning for imbalanced classification of coronary heart disease tabular data

Main Article Content

Danny Xie-Li
Manfred González-Hernández

Abstract

Triple Mixup is an augmentation policy in the hidden latent space we introduced in the Contrastive
Mixup Self-Semi Supervised learning framework, to address the imbalanced data problem, for
Cardiovascular Heart Diseases tabular dataset. Medical tabular datasets are known to present
challenges as high imbalanced class, limited annotated quality samples due to the domain
nature. Recent literature in Self and Semi supervised learning, has shown tremendous progress
in learning useful representations, and leveraging unlabeled dataset and labeled dataset to
train a learning model. Most existing methods are not feasible for tabular data due to the data
augmentation scheme. In addition, the high imbalanced problem can show lower performance
on machine learning algorithms. For this work, we propose the triple data augmentation method
in hidden space to attack the unbalanced challenge in self-supervised and semi-supervised
learning, from the possible applications of Contrastive Mixup, thus we will study the influence of it.

Article Details

How to Cite
Xie-Li, D., & González-Hernández, M. (2024). Proposal of self and semi-supervised learning for imbalanced classification of coronary heart disease tabular data. Tecnología En Marcha Journal, 37(7), Pág 38–43. https://doi.org/10.18845/tm.v37i7.7295
Section
Artículo científico

References

M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proceedings of the

IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 778–10 787.

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” ´ IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020.

A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,”

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region

Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp.

–1149, 2017.

Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,” IEEE Transactions

on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.

D. Snow, “DeltaPy: A Framework for Tabular Data Augmentation in Python,” SSRN Electronic Journal, pp. 1–3,

B. Sathianarayanan, Y. C. Singh Samant, P. S. Conjeepuram Guruprasad, V. B. Hariharan, and N. D. Manickam,

“Feature-based augmentation and classification for tabular data,” CAAI Transactions on Intelligence

Technology, vol. 7, no. 3, pp. 481–491, 2022.

G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks

for tabular data via row attention and contrastive pre-training,” 6 2021. [Online]. Available: http://arxiv.org/

abs/2106.01342

J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, “Vime: Extending the success of self-and semi-supervised learning to tabular domain,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 033–11

, 2020.

M. Hyun, J. Jeong, and N. Kwak, “Class-imbalanced semi-supervised learning,” 2 2020. [Online]. Available:

http://arxiv.org/abs/2002.06815

S. Darabi, S. Fazeli, A. Pazoki, S. Sankararaman, and M. Sarrafzadeh, “Contrastive mixup: Self- and semisupervised learning for tabular domain,” 2021. [Online]. Available: http://arxiv.org/abs/2108.12296

X. Li, L. Khan, M. Zamani, S. Wickramasuriya, K. W. Hamlen, and B. Thuraisingham, “Mcom: A semi-supervised

method for imbalanced tabular security data” in IFIP Annual Conference on Data and Applications Security and

Privacy. Springer, 2022, pp. 48–67.

A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised

learning,” 10 2020. [Online]. Available: http://arxiv.org/abs/2011.00362

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” 2 2020. [Online]. Available: http://arxiv.org/abs/2002.05709

P. M. Tripathi, A. Kumar, R. Komaragiri, and M. Kumar, A Review on Computational Methods for Denoising and

Detecting ECG Signals to Detect Cardiovascular Diseases. Springer Netherlands, 2022, vol. 29, no. 3. [Online].

Available: https://doi.org/10.1007/s11831-021-09642-2

A. Subas, E. Alickovic, and J. Kevric, “Diagnosis of chronic kidney disease by using random forest,” IFMBE

Proceedings, vol. 62, no. 3, pp. 589–594, 2017.

W. Deng, Z. Huang, J. Zhang, and J. Xu, “A Data Mining Based System for Transaction Fraud Detection,”

IEEE International Conference on Consumer Electronics and Computer Engineering, ICCECE 2021,pp.

–545, 2021.

D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, “Prediction of coronary heart disease using

supervised machine learning algorithms,” IEEE Region 10 Annual International Conference, Proceedings/

TENCON, vol. 2019-Octob, pp. 367–372, 2019.

H. Yang, “Coronary heart disease historical data,” 2022. [Online]. Available: https://dx.doi.org/10.21227/eapxt883

T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual

representations,” CoRR, vol. abs/2002.05709, 2020. [Online]. Available: https://arxiv.org/abs/2002.05709

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation

learning,” 2019. [Online]. Available: https://arxiv.org/abs/1911.05722