Comparative analysis of traditional methods and a deep learning approach for multivariate imputation of missing values in the meteorological field

Main Article Content

Ana Cristina Arias-Muñoz
Susana Cob-García
Luis Alexander Calvo-Valverde

Abstract

Climate observations are the groundwork for several real-world applications such as weather forecasting, climate change monitoring and environmental impact assessments. However, the data is mostly measured and recorded by external devices exposed to numerous variables, causatives of malfunctions and, therefore, missing values. Nowadays, data imputation in the time series field has been researched in depth and a wide variety of methods have been proposed, where traditional classification and regression algorithms predominate, even though there are also deep learning approaches that manage to capture temporal relationships between observations. In this article, a comparative analysis between a classification imputation algorithm, a regression imputation algorithm, and a deep learning imputation model is made: MissForest algorithm, based on random trees; Expectation Maximization with Bootstrap (EMB), the maximum likelihood estimation algorithm; and a proposed deep learning model, based on the Long-Short Term Memory (LSTM) architecture. Data from the Costa Rica meteorological field were used, which consist of multivariate data coming from several weather stations in the same geographical area.

Article Details

How to Cite
Arias-Muñoz, A. C., Cob-García, S., & Calvo-Valverde, L. A. (2024). Comparative analysis of traditional methods and a deep learning approach for multivariate imputation of missing values in the meteorological field. Tecnología En Marcha Journal, 37(3). https://doi.org/10.18845/tm.v37i3.6746
Section
Artículo científico

References

Y. Zhang, P. J. Thorburn, W. Xiang and P. Fitch, “SSIM—A Deep Learning Approach for Recovering Missing Time Series Sensor Data” in IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6618-6628, 2019, doi: 10.1109/JIOT.2019.2909038.

N. Bokde, M. W. Beck, F. Martínez-Alvarez and K. Kulat, “A novel imputation methodology for time series based on pattern sequence forecasting” in Pattern Recognition Letters, vol. 116, no. 7, pp. 88-96, 2018, doi: 10.1016/j.patrec.2018.09.020.

N. Donges. “A Guide to Recurrent Neural Networks: Understanding RNN and LSTM Networks” Built In, 2021, builtin.com/data-science/recurrent-neural-networks-and-lstm. Accessed 18 Apr. 2022.

J. M. Jerez, I. Molina, P. J. García-Laencina, E. Alba, N. Ribelles, M. Martín and L. Franco, “Missing data imputation using statistical and machine learning methods in a real breast cancer problem” in Artificial Intelligence in Medicine, vol. 50, no. 2, pp. 105-115, 2010, doi: 10.1016/j.artmed.2010.05.002.

T. Liu, H. Wei, and K. Zhang, “Wind power prediction with missing data using Gaussian process regression and multiple imputation” in Applied Soft Computing, vol. 71, pp. 905-916, 2018, doi: 10.1016/j.asoc.2018.07.027.

M. E. Quinteros, S. Lu, C. Blazquez, J. P. Cárdenas-R, X. Ossa, J.-M. Delgado-Saborit, R. M. Harrison, and P. Ruiz-Rudolph, “Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile” in Atmospheric Environment, vol. 200, pp. 40-49, 2019, doi: 10.1016/j.atmosenv.2018.11.053.

L. Chen, J. Xu, G. Wang, and Z. Shen, “Comparison of the multiple imputation approaches for imputing rainfall data series and their applications to watershed models” in Journal of Hydrology, vol. 572, pp. 449-460, 2019, doi: 10.1016/j.jhydrol.2019.03.025.

S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork, “Comparison of different Methods for Univariate Time Series Imputation in R”, 2015, doi: 10.48550/arXiv.1510.03924.

W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y. Li, “BRITS: Bidirectional Recurrent Imputation for Time Series”, in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018, doi: 10.48550/arXiv.1805.10572.

F. Oppong and S. Yao, “Assessing Univariate and Multivariate Normality, A Guide For Non-Statisticians”, in Mathematical Theory and Modeling, vol. 6, no. 2, pp. 26-33, 2016.

Y. Kim, H. Kim, G. Lee, and K.-H. Min, “A Modified Hybrid Gamma and Generalized Pareto Distribution for Precipitation Data”, in Asia-Pacific Journal of Atmospheric Sciences, vol. 55, no. 4, pp. 609-616, 2019, doi: 10.1007/s13143-019-00114-z.

A. Mohammed, “LSTM and Bidirectional LSTM for Regression - Towards Data Science”, Medium, 2022, towardsdatascience.com/lstm-and-bidirectional-lstm-for-regression-4fddf910c655. Accessed 10 Feb. 2022.

I. Sucholutsky, A. Narayan, M. Schonlau, and S. Fischmeister, “Deep Learning for System Trace Restoration”, 2019 International Joint Conference on Neural Networks (IJCNN) (2019): 1-8, doi: 10.48550/arXiv.1904.05411.

J. Honaker, G. King, and M. Blackwell, “Amelia II: A Program for Missing Data”, in Journal of Statistical Software, vol. 45, no. 7, pp. 1-47, 2011, doi: 10.18637/jss.v045.i07.

J. J. Miró, V. Caselles, and M. J. Estrela, “Multiple imputation of rainfall missing data in the Iberian Mediterranean context”, in Atmospheric Research, vol. 197, pp. 313-330, 2017, doi: 10.1016/j.atmosres.2017.07.016.

A. V. Desherevskii, I. Zhuravlev, N. Nikolsky, and Y. Sidorin, “Problems in Analyzing Time Series with Gaps and Their Solution with the WinABD Software Package” in Izvestiya, Atmospheric and Oceanic Physics, vol. 53, no. 7, pp. 659-678, 2018, doi: 10.1134/S0001433817070027.

A. Andiojaya and H. Demirhan, “A bagging algorithm for the imputation of missing values in time series”, in Expert Systems With Applications, vol. 129, no. 3, pp. 10-26, 2019, doi: 10.1016/j.eswa.2019.03.044.

L. Campozano, E. Sanchez, A. Avilés, and E. Samaniego, “Evaluation of infilling methods for time series of daily precipitation and temperature: The case of the Ecuadorian Andes”, in Maskana¸ vol. 5, no. 1, pp. 99-115, 2014, doi: 10.18537/mskn.05.01.07.

M. B. Richman, T. B. Trafalis, and I. Adrianto, “Multiple imputation through machine learning algorithms”, 87th AMS Annual Meeting, 2007.

C. Zhai, “A Note on the Expectation-Maximization (EM) Algorithm”, 2004.

J. Honaker, and G. King, “What to do About Missing Values in Time Series Cross-Section Data”, in American Journal of Political Science, vol. 54, no. 2, pp. 561-581, 2010, doi: 10.1111/j.1540-5907.2010.00447.x.

T. Khampuengson and W. Wang, “Novel Methods for Imputing Missing Values in Water Level Monitoring Data”, in Water Resources Management, vol. 37, no. 2, pp. 851-878, 2023, doi: 10.1007/s11269-022-03408-6