Feature importance analysis for enhanced interpretability of spectrophotometric Machine Learning (ML) models in water quality monitoring

Main Article Content

Laura Hernández-Alpízar
José Andrés Gómez-Mejía

Abstract

Ultraviolet-visible (UV-Vis) spectrophotometry for real-time NO3- quantification in water is commonly affected by spectral interferences from Dissolved Organic Matter (DOM). This study evaluates the use of machine learning (ML) models for this task, using feature importance analysis as a method to enhance chemical interpretability and detect spectral interferences. Four algorithms were compared using a dataset of 29 surface water samples: PCA-Random Forest (PCA-RF), PCA-XGBoost, full-spectrum RF (All-RF), and full-spectrum XGBoost (All-XGB). Leave-one-out cross-validation (LOOCV) showed no significant performance differences among the models (p = 0.182), with mean RMSE values between 0.6 and 0.8 mg / L. Nonetheless, feature importance analysis revealed that PCA-based models depend on variance rather than chemical relevance, which limits their reliability. The full-spectrum XGBoost model demonstrated superior spectral interpretability, successfully identifying both the NO3- absorption peak (≈ 220 nm) and the DOM interference correction peak (≈ 260 nm). This suggests that XGBoost could be advantageous for continuous water monitoring systems due to its ability to identify spectral interferences.

Article Details

How to Cite
Hernández-Alpízar, L., & Gómez-Mejía, J. A. (2026). Feature importance analysis for enhanced interpretability of spectrophotometric Machine Learning (ML) models in water quality monitoring. Tecnología En Marcha Journal, 39(5), Pág. 276–284. https://doi.org/10.18845/tm.v39i5.8521
Section
Aplicaciones científicas y ambientales de la IA

References

[1] J. Villalobos-Villegas, A. Carrasquilla-Batista and L. Hernández-Alpízar, “Water quality monitoring station through nitrate measuring with IoT,” in 2023 IEEE 5th International Conference on BioInspired Processing (BIP), Alajuela, Costa Rica, 2023, doi: 10.1109/BIP60195.2023.10379419.

[2] Y. Guo et al, “Advances on Water Quality Detection by UV-Vis Spectroscopy,” Appl. Sci., vol. 10, (19), pp. 6874, 2020, doi: 10.3390/app10196874.

[3] M. F. Silva et al, “Usability of simplified UV–Vis spectrophotometric methods for the determination of nitrate in the presence of organic matter and chloride as interfering factors,” Wat. Pract. Tech., vol. 19, (3), pp. 1061–1070, 2024, doi: 10.2166/wpt.2024.043.

[4] T. R. Holm, “NO3- nitrogen (nitrate),” in Standard Methods for the Examination of Water and Wastewater, R. B. Bair, A. D. Eaton and E. W. Rice, Eds. Washington DC: American Public Health Association, 2017, pp. 1–2.

[5] Q. Huang et al, “Exploring the Impact of Dissolved Organic Matter on Nitrate Detection: Developing a Lab Experiment Using Standard Ultraviolet Spectrophotometry,” J. Chem. Educ., vol. 101, (5), pp. 2030–2038, 2024, doi: 10.1021/acs.jchemed.3c00958.

[6] T. J. Maguire et al, “Ultraviolet-visual spectroscopy estimation of nitrate concentrations in surface waters via machine learning,” Limnol Oceanogr Methods, vol. 20, (1), pp. 26–33, 2022, doi: 10.1002/lom3.10468.

[7] Y. Lyu et al, “Development of statistical regression and artificial neural network models for estimating nitrogen, phosphorus, COD, and suspended solid concentrations in eutrophic rivers using UV–Vis spectroscopy,” Environ. Monit. Assess., vol. 195, (9), pp. 1114, 2023, doi: 10.1007/s10661-023-11738-0.

[8] J. Park et al, “Interpretation of ensemble learning to predict water quality using explainable artificial intelligence,” Sci. Total Environ., vol. 832, pp. 155070, 2022, doi: 10.1016/j.scitotenv.2022.155070.

[9] M. Cardia et al, “Machine Learning for the Estimation of COD from UV-Vis Spectrometer in Leather Industries Wastewater,” IJEPR, vol. 11, pp. 10–19, 2023, doi: 10.11159/ijepr.2023.002.

[10] C. Chen et al, “Characteristic Wavelength Selection and Surrogate Monitoring for UV–Vis Absorption Spectroscopy-Based Water Quality Sensing,” Water, vol. 17, (3), pp. 343, 2025, doi: 10.3390/w17030343.

[11] C. Fei et al, “Machine learning techniques for real-time UV-vis spectral analysis to monitor dissolved nutrients in surface water,” in AI and Optical Data Sciences II, 2021, doi: 10.1117/12.2577050.

[12] J. Jiang and S. Tang. , 2022, “Spectral Water Quality Data,” Mendeley Data, doi: 10.17632/d4vzbcxxcy.1.

[13] Y. Chen and Y. Yang, “The One Standard Error Rule for Model Selection: Does It Work?” Stats, vol. 4, (4), pp. 868–892, 2021, doi: 10.3390/stats4040051.

[14] S. M. Teague, “UV absorbing organic constituents,” in Standard Methods for the Examination of Water and Wastewater, R. B. Bair, A. D. Eaton and E. W. Rice, Eds. Washington DC: American Public Health Association, 2017, pp. 1–2.

[15] F. L. Gewers et al, “Principal Component Analysis: A Natural Approach to Data Exploration,” ACM Comput.Surv., vol. 54, (4), 2021, doi: 10.1145/3447755.

[16] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, California, 2016, doi: 10.1145/2939672.2939785.

[17] S. Hossain et al, “Development of an Optical Method to Monitor Nitrification in Drinking Water,” Sensors, vol. 21, (22), 2021, doi: 10.3390/s21227525.