Proposal of self and semi-supervised learning for imbalanced classification of coronary heart disease tabular data
Main Article Content
Abstract
Triple Mixup is an augmentation policy in the hidden latent space we introduced in the Contrastive
Mixup Self-Semi Supervised learning framework, to address the imbalanced data problem, for
Cardiovascular Heart Diseases tabular dataset. Medical tabular datasets are known to present
challenges as high imbalanced class, limited annotated quality samples due to the domain
nature. Recent literature in Self and Semi supervised learning, has shown tremendous progress
in learning useful representations, and leveraging unlabeled dataset and labeled dataset to
train a learning model. Most existing methods are not feasible for tabular data due to the data
augmentation scheme. In addition, the high imbalanced problem can show lower performance
on machine learning algorithms. For this work, we propose the triple data augmentation method
in hidden space to attack the unbalanced challenge in self-supervised and semi-supervised
learning, from the possible applications of Contrastive Mixup, thus we will study the influence of it.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 778–10 787.
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” ´ IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020.
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp.
–1149, 2017.
Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.
D. Snow, “DeltaPy: A Framework for Tabular Data Augmentation in Python,” SSRN Electronic Journal, pp. 1–3,
B. Sathianarayanan, Y. C. Singh Samant, P. S. Conjeepuram Guruprasad, V. B. Hariharan, and N. D. Manickam,
“Feature-based augmentation and classification for tabular data,” CAAI Transactions on Intelligence
Technology, vol. 7, no. 3, pp. 481–491, 2022.
G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks
for tabular data via row attention and contrastive pre-training,” 6 2021. [Online]. Available: http://arxiv.org/
abs/2106.01342
J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, “Vime: Extending the success of self-and semi-supervised learning to tabular domain,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 033–11
, 2020.
M. Hyun, J. Jeong, and N. Kwak, “Class-imbalanced semi-supervised learning,” 2 2020. [Online]. Available:
http://arxiv.org/abs/2002.06815
S. Darabi, S. Fazeli, A. Pazoki, S. Sankararaman, and M. Sarrafzadeh, “Contrastive mixup: Self- and semisupervised learning for tabular domain,” 2021. [Online]. Available: http://arxiv.org/abs/2108.12296
X. Li, L. Khan, M. Zamani, S. Wickramasuriya, K. W. Hamlen, and B. Thuraisingham, “Mcom: A semi-supervised
method for imbalanced tabular security data” in IFIP Annual Conference on Data and Applications Security and
Privacy. Springer, 2022, pp. 48–67.
A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised
learning,” 10 2020. [Online]. Available: http://arxiv.org/abs/2011.00362
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” 2 2020. [Online]. Available: http://arxiv.org/abs/2002.05709
P. M. Tripathi, A. Kumar, R. Komaragiri, and M. Kumar, A Review on Computational Methods for Denoising and
Detecting ECG Signals to Detect Cardiovascular Diseases. Springer Netherlands, 2022, vol. 29, no. 3. [Online].
Available: https://doi.org/10.1007/s11831-021-09642-2
A. Subas, E. Alickovic, and J. Kevric, “Diagnosis of chronic kidney disease by using random forest,” IFMBE
Proceedings, vol. 62, no. 3, pp. 589–594, 2017.
W. Deng, Z. Huang, J. Zhang, and J. Xu, “A Data Mining Based System for Transaction Fraud Detection,”
IEEE International Conference on Consumer Electronics and Computer Engineering, ICCECE 2021,pp.
–545, 2021.
D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, “Prediction of coronary heart disease using
supervised machine learning algorithms,” IEEE Region 10 Annual International Conference, Proceedings/
TENCON, vol. 2019-Octob, pp. 367–372, 2019.
H. Yang, “Coronary heart disease historical data,” 2022. [Online]. Available: https://dx.doi.org/10.21227/eapxt883
T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual
representations,” CoRR, vol. abs/2002.05709, 2020. [Online]. Available: https://arxiv.org/abs/2002.05709
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation
learning,” 2019. [Online]. Available: https://arxiv.org/abs/1911.05722