Adding a teaching “assistant”: improving the quality of pseudo-labels for semi-supervised object detection
Main Article Content
Abstract
This paper focuses on semi-supervised object detection (SS-OD) for its tolerance to small amounts of training samples, which is common in real-world applications. Pseudo-label-based approaches have been the mainstream for SS-OD. In this paper, we first show the impact of accurate pseudo-labeling and the challenge of producing such labels. In contrast to prior research that predominantly focused on refining the main model to enhance localization, this paper introduces a novel strategy, where a standalone “Teaching Assistant” or simply “Assistant” is involved in the popular Teacher/Student paradigm to improve the quality of pseudo-labels. This “Assistant” can be plugged into any existing Teacher/Student-based framework without having to fine-tune the original Teacher/Student model. We exploit two “Assistant” models, both of which center around the non-maximum suppression (NMS) method -- a popular technique used to select only the promising bounding boxes. The first “Assistant” model is referred to as the “pre-NMS” assistant that refines the candidate bounding box scores for a better set of inputs to the NMS process. The second “Assistant” model is referred to as the “post-NMS” assistant which takes advantage of SOTA segmentation models to improve the output from the NMS process. We thoroughly evaluate the performance of pre-NMS vs. post-NMS and the impact of improved pseudo-labels on the OD performance. Experimental results on the COCO dataset demonstrate that post-NMS is better than SOTA methods.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, number:7553 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/nature14539
J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Machine learning, vol. 109, no. 2, pp. 373–440, 2020.
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 596–608.
A. Kurakin, C. Raffel, D. Berthelot, E. D. Cubuk, H. Zhang, K. Sohn, and N. Carlini, “Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring,” in ICLR, 2020. [Online]. Available: https://openreview.net/pdf?id=HklkeR4KPB
S. Laine and T. Aila, “Temporal Ensembling for Semi-Supervised Learning,” July 2022. [Online]. Available: https://openreview.net/forum?id=BJ6oOfqge
D.-h. Lee, “Pseudo-Label: The Simple Semi-Supervised Learning Method for Deep Neural Networks,” in International Conference on Machine Learning (ICML). ACM, Aug. 2013.
H. Pham, Z. Dai, Q. Xie, and Q. V. Le, “Meta pseudo labels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 11 557–11
O. Chapelle, B. Sch ̈olkopf, and A. Zien, Eds., Semi-supervised learning, ser. Adaptive computation and machine learning. Cambridge, Mass: MIT Press, 2006.
L. Wang and K.-J. Yoon, “Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3048–3068, 2022.
Y.-C. Liu, and C.-Y. Ma, “Unbiased teacher for semi-supervised object detection,” in International Conference on Learning Representations, 2021.
H. Li, Z. Wu, A. Shrivastava, and L. S. Davis, “Rethinking Pseudo Labels for Semi-supervised Object Detection,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1314–1322, June 2022, number: 2.
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Doll ́ar, and R. Girshick, “Segment Anything,” Apr. 2023, arXiv:2304.02643 [cs]. Available: http://arxiv.org/abs/2304.02643
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” June 2021, arXiv:2010.11929 [cs]. [Online]. Available: http://arxiv.org/abs/2010.11929
P. Bachman, O. Alsharif, and D. Precup, “Learning with Pseudo-Ensembles,” in Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc., 2014.
A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017.
Y. Grandvalet and Y. Bengio, “Semi-supervised Learning by Entropy Minimization,” in Advances in Neural Information Processing Systems, vol. 17. MIT Press, 2004.
J. Hoffman, S. Guadarrama, E. S. Tzeng, R. Hu, J. Donahue, R. Girshick, T. Darrell, and K. Saenko, “LSDA: Large Scale Detection through Adaptation,” in Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc., 2014.
J. Gao, J. Wang, S. Dai, L.-J. Li, and R. Nevatia, “Note-rcnn: Noise tolerant ensemble rcnn for semi-supervised object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
J. Jeong, S. Lee, J. Kim, and N. Kwak, “Consistency-based Semi-supervised Learning,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
K. Sohn, Z. Zhang, C.-L. Li, H. Zhang, C.-Y. Lee, and T. Pfister, “A Simple Semi-Supervised Learning Framework for Object Detection,” Tech. Rep., May 2020, publication Title: arXiv e-prints ADS Bibcode: 2020arXiv200504757S Type: article. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2020arXiv200504757S
P. Mi, J. Lin, Y. Zhou, Y. Shen, G. Luo, X. Sun, L. Cao, R. Fu, Q. Xu, and R. Ji, “Active teacher for semi-supervised object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 14 482–14 491.
K. Zhang and D. Liu, “Customized Segment Anything Model for Medical Image Segmentation,” Apr. 2023, publication Title: arXiv ADS Bibcode: 2023arXiv230413785Z. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2023arXiv230413785Z
R. Zhang, Z. Jiang, Z. Guo, S. Yan, J. Pan, H. Dong, P. Gao, and H. Li, “Personalize Segment Anything Model,” May 2023, publication Title: arXiv e-prints ADS Bibcode: 023arXiv230503048Z. Available: https://ui.adsabs.harvard.edu/abs/2023arXiv230503048Z
S. Roy, T. Wald, G. Koehler, M. R. Rokuss, N. Disch, J. Holzschuh, D. Zimmerer, and K. H. Maier-Hein, “SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model,” Apr. 2023, publication Title: arXiv e-prints ADS Bibcode: 2023arXiv230405396R: https://ui.adsabs.harvard.edu/abs/2023arXiv230405396R
W. Ji, J. Li, Q. Bi, T. Liu, W. Li, and L. Cheng, “Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications,” Apr. 2023, publication Title: arXiv e-prints ADS Bibcode: 2023arXiv230405750J. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2023arXiv230405750J
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, Sept. 2013. [Online]. Available: https://doi.org/10.1007/s11263-013-0620-5
Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen, H. Chi, X. Hu, D.P. Fan, F. Dong, and D. Ni, “Segment Anything Model for Medical Images?” Apr. 2023, publication Title: arXiv e- prints ADS Bibcode: 2023arXiv230414660H. [Online]. Available: https://ui.adsabs.harvard.edu/abs/2023arXiv230414660H
J. Dyson, A. Mancini, E. Frontoni, and P. Zingaretti, “Deep Learning for Soil and Crop Segmentation from Remotely Sensed Data,” Remote Sensing, vol. 11, no. 16, p. 1859, Jan. 2019, number: 16 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/2072-4292/11/16/1859