Perception process of the surroundings of UAV systems in urban environments
Main Article Content
Abstract
Unmanned Aerial Vehicle (UAV) systems have gained importance in various applications due to their ability to operate in hazardous or hard-to-reach environments. However, safe and efficient navigation in complex environments, such as urban areas or indoor spaces, presents significant challenges in terms of environmental perception. Accurate perception of the surrounding environment is crucial for a UAV to build a 3D map, detect and avoid obstacles, locate points of interest, and plan safe flight paths. To address these challenges, UAV systems integrate a variety of sensors, such as RGB and depth cameras, radars, LiDARs, and inertial sensors. These sensors enable the capture of visual, geometric, and motion information from the environment. Additionally, advanced techniques in signal processing, computer vision, machine learning, and sensor fusion are employed to interpret sensor data and construct an accurate 3D representation of the environment. In the presented work, a method for indoor path exploration was proposed by detecting doors and hallways using YOLO in combination with monocular background estimation. The paths were validated, and with this information, maps were generated to explore a given area. An accuracy rate of 86.9% was achieved, although some false positives occurred where the model did not correctly detect depth. The generated map keeps a history of the agent’s explored territory, serving as a potential option for integration with swarm algorithms.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
[1] M. K. A. Rahman et al., “A survey on UAVs and their applications,” IEEE Access, vol. 8, pp. 52472-52492, 2020.
[2] G. S. S. A. Amaral et al., “A survey on autonomous UAVs for environmental monitoring,” Sensors, vol. 19, no. 17, pp. 3856-3874, 2019.
[3] Y. Wang et al., “Urban localization for UAVs using multi-modal data fusion,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 990-1001, 2019.
[4] M. Shams et al., “Depth estimation techniques for autonomous systems: A review,” Sensors, vol. 20, no. 22, pp. 6523-6535, 2020.
[5] J. Y. L. Bouguet et al., “Visual-inertial odometry for UAVs,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2220-2227, 2019.
[6] D. C. C. Wei et al., “Path planning and obstacle avoidance for UAVs,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 4, pp. 1412-1420, 2019.
[7] D. Eigen et al., “Depth map prediction from a single image using a multi-scale deep network,” in Proc. of the NIPS, 2014, pp. 2366-2374.
[8] G. Chen et al., “A review of monocular depth estimation,” Computer Vision and Image Understanding, vol. 165, pp. 68-79, 2018.
[9] J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. of CVPR, 2016, pp. 779-788.
[10] A. Bochkovskiy et al., “YOLOv4: Optimal Speed and Accuracy for Real-Time Object Detection,” arXiv preprint arXiv:2004.10934, 2020.
[11] M. L. U. Xu et al., “Monocular depth estimation: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2200-2221, 2019.
[12] W. Luo et al., “Single view stereo matching,” in Proc. of CVPR, 2018, pp. 155-163.
[13] A. Kendall et al., “End-to-end learning of geometry and context for deep stereo regression,” in Proc. of ICCV, 2017, pp. 66-75.
[14] H. Fu et al., “Deep ordinal regression network for monocular depth estimation,” in Proc. of CVPR, 2018, pp. 2002-2011.
[15] Z. Yang et al., “LEGO: Learning edge with geometry all at once by watching videos,” in Proc. of CVPR, 2018, pp. 2258-2267.
[16] S. S. Q. Lin et al., “A real-time UAV-based system for automatic road detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 7, pp. 2145-2155, 2018.
[17] A. Satake et al., “Path planning for autonomous UAV navigation in unknown environments,” IEEE Transactions on Automation Science and Engineering, vol. 17, no. 4, pp. 1802-1815, 2020.
[18] R. Mur-Artal et al., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, 2015.
[19] M. Fischler and R. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[20] K. He et al., “Mask R-CNN,” in Proc. of ICCV, 2017, pp. 2980-2988.
[21] R. B. Rusu et al., “Fast point feature histograms (FPFH) for 3D registration,” in Proc. of ICRA, 2009, pp. 3212-3217.
[22] H. Hirschmüller, “Accurate and efficient stereo processing by semi-global matching and mutual information,” in Proc. of CVPR, 2005, pp. 807-814.
[23] P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239-256, 1992.
[24] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. of CVPR, 2017, pp. 7263-7271.
[25] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[26] C. Szegedy et al., “Going deeper with convolutions,” in Proc. of CVPR, 2015, pp. 1-9.
[27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. of ICLR, 2015.
[28] A. Krizhevsky et al., “ImageNet classification with deep convolutional neural networks,” in Proc. of NIPS, 2012, pp. 1097-1105.
[29] G. Huang et al., “Densely connected convolutional networks,” in Proc. of CVPR, 2017, pp. 4700-4708.
[30] A. Vaswani et al., “Attention is all you need,” in Proc. of NeurIPS, 2017, pp. 5998-6008.
[31] Elamin, A., et al. (2024). Event-Based Visual/Inertial Odometry for UAV Indoor Navigation. Sensors.
[32] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart (2016). The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research.
[33] Park, I., et al. (2023). Fusion localization for indoor airplane inspection using visual-inertial odometry and RTLS. Scientific Reports.
[34] Messbah, H.; Emharraf, M.; Saber, M. (2024). Robot indoor navigation: Comparative analysis of LiDAR 2D and visual SLAM.
[35] Gallego, G., et al. (2024). Recent Event Camera Innovations: A Survey.
[36] Butt, M. Z., et al. (2024). A review of perception sensors, techniques, and hardware for UAVs.
[37] Katkuri, A. V. R., et al. (2024). Autonomous UAV navigation using deep learning-based perception: systematic literature review (2019–2024). (ScienceDirect).