Feature importance analysis for enhanced interpretability of spectrophotometric Machine Learning (ML) models in water quality monitoring
Main Article Content
Abstract
Ultraviolet-visible (UV-Vis) spectrophotometry for real-time NO3- quantification in water is commonly affected by spectral interferences from Dissolved Organic Matter (DOM). This study evaluates the use of machine learning (ML) models for this task, using feature importance analysis as a method to enhance chemical interpretability and detect spectral interferences. Four algorithms were compared using a dataset of 29 surface water samples: PCA-Random Forest (PCA-RF), PCA-XGBoost, full-spectrum RF (All-RF), and full-spectrum XGBoost (All-XGB). Leave-one-out cross-validation (LOOCV) showed no significant performance differences among the models (p = 0.182), with mean RMSE values between 0.6 and 0.8 mg / L. Nonetheless, feature importance analysis revealed that PCA-based models depend on variance rather than chemical relevance, which limits their reliability. The full-spectrum XGBoost model demonstrated superior spectral interpretability, successfully identifying both the NO3- absorption peak (≈ 220 nm) and the DOM interference correction peak (≈ 260 nm). This suggests that XGBoost could be advantageous for continuous water monitoring systems due to its ability to identify spectral interferences.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Los autores conservan los derechos de autor y ceden a la revista el derecho de la primera publicación y pueda editarlo, reproducirlo, distribuirlo, exhibirlo y comunicarlo en el país y en el extranjero mediante medios impresos y electrónicos. Asimismo, asumen el compromiso sobre cualquier litigio o reclamación relacionada con derechos de propiedad intelectual, exonerando de responsabilidad a la Editorial Tecnológica de Costa Rica. Además, se establece que los autores pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
References
[1] J. Villalobos-Villegas, A. Carrasquilla-Batista and L. Hernández-Alpízar, “Water quality monitoring station through nitrate measuring with IoT,” in 2023 IEEE 5th International Conference on BioInspired Processing (BIP), Alajuela, Costa Rica, 2023, doi: 10.1109/BIP60195.2023.10379419.
[2] Y. Guo et al, “Advances on Water Quality Detection by UV-Vis Spectroscopy,” Appl. Sci., vol. 10, (19), pp. 6874, 2020, doi: 10.3390/app10196874.
[3] M. F. Silva et al, “Usability of simplified UV–Vis spectrophotometric methods for the determination of nitrate in the presence of organic matter and chloride as interfering factors,” Wat. Pract. Tech., vol. 19, (3), pp. 1061–1070, 2024, doi: 10.2166/wpt.2024.043.
[4] T. R. Holm, “NO3- nitrogen (nitrate),” in Standard Methods for the Examination of Water and Wastewater, R. B. Bair, A. D. Eaton and E. W. Rice, Eds. Washington DC: American Public Health Association, 2017, pp. 1–2.
[5] Q. Huang et al, “Exploring the Impact of Dissolved Organic Matter on Nitrate Detection: Developing a Lab Experiment Using Standard Ultraviolet Spectrophotometry,” J. Chem. Educ., vol. 101, (5), pp. 2030–2038, 2024, doi: 10.1021/acs.jchemed.3c00958.
[6] T. J. Maguire et al, “Ultraviolet-visual spectroscopy estimation of nitrate concentrations in surface waters via machine learning,” Limnol Oceanogr Methods, vol. 20, (1), pp. 26–33, 2022, doi: 10.1002/lom3.10468.
[7] Y. Lyu et al, “Development of statistical regression and artificial neural network models for estimating nitrogen, phosphorus, COD, and suspended solid concentrations in eutrophic rivers using UV–Vis spectroscopy,” Environ. Monit. Assess., vol. 195, (9), pp. 1114, 2023, doi: 10.1007/s10661-023-11738-0.
[8] J. Park et al, “Interpretation of ensemble learning to predict water quality using explainable artificial intelligence,” Sci. Total Environ., vol. 832, pp. 155070, 2022, doi: 10.1016/j.scitotenv.2022.155070.
[9] M. Cardia et al, “Machine Learning for the Estimation of COD from UV-Vis Spectrometer in Leather Industries Wastewater,” IJEPR, vol. 11, pp. 10–19, 2023, doi: 10.11159/ijepr.2023.002.
[10] C. Chen et al, “Characteristic Wavelength Selection and Surrogate Monitoring for UV–Vis Absorption Spectroscopy-Based Water Quality Sensing,” Water, vol. 17, (3), pp. 343, 2025, doi: 10.3390/w17030343.
[11] C. Fei et al, “Machine learning techniques for real-time UV-vis spectral analysis to monitor dissolved nutrients in surface water,” in AI and Optical Data Sciences II, 2021, doi: 10.1117/12.2577050.
[12] J. Jiang and S. Tang. , 2022, “Spectral Water Quality Data,” Mendeley Data, doi: 10.17632/d4vzbcxxcy.1.
[13] Y. Chen and Y. Yang, “The One Standard Error Rule for Model Selection: Does It Work?” Stats, vol. 4, (4), pp. 868–892, 2021, doi: 10.3390/stats4040051.
[14] S. M. Teague, “UV absorbing organic constituents,” in Standard Methods for the Examination of Water and Wastewater, R. B. Bair, A. D. Eaton and E. W. Rice, Eds. Washington DC: American Public Health Association, 2017, pp. 1–2.
[15] F. L. Gewers et al, “Principal Component Analysis: A Natural Approach to Data Exploration,” ACM Comput.Surv., vol. 54, (4), 2021, doi: 10.1145/3447755.
[16] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, California, 2016, doi: 10.1145/2939672.2939785.
[17] S. Hossain et al, “Development of an Optical Method to Monitor Nitrification in Drinking Water,” Sensors, vol. 21, (22), 2021, doi: 10.3390/s21227525.