Document Type : Original Article

Authors

1 Department of Applied Mathematics, University of Guilan, Rasht, Iran

2 Department of Statistics, University of Guilan

3 Department of Statistics, University of Mazandaran, Babolsar, Iran

Abstract

The issue of missing data is a pervasive challenge in research, posing a significant obstacle to the reliability and validity of study findings. To address this issue, researchers have developed numerous approaches for replacing missing values. In this study, we focus on one such method for imputing missing data. Specifically, our paper introduces a novel technique for addressing missing data (latent variables) by implementing a partitioning strategy for the data that contains these missing values. Subsequently, we utilize the Expectation-Maximization (EM) method to compensate for the missing values within each resulting partition. Our findings demonstrate the efficacy of segmenting data that includes missing values, revealing that employing a higher degree of segmentation leads to improved estimation accuracy. To evaluate the performance of our approach, we compared the results using two key indices, namely Mean Squared Error (MSE) and Standard Deviation (S.D), across complete data, missing data, and partitioned data scenarios. Notably, our analysis focused on situations where data loss completely at random within real-world datasets. In summary, this research contributes a new and effective method for addressing the challenge of missing data through data segmentation and the application of Expectation-Maximization techniques. Our results highlight the potential of this approach to enhance the accuracy and reliability of data analysis in the presence of missing values.

Keywords

[1] García-Laencina, P. J., Sancho-Gómez, J. L., & Figueiras-Vidal, A. R. (2010). Pattern classification with missing data: a review. Neural Computing and Applications, 19, 263-282.
[2] Choudhury, S. J., & Pal, N. R. (2019). Imputation of missing data with neural networks for classification. Knowledge-Based Systems, 182.
[3] Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data . John Wiley & Sons, 793.
[4] Enders, C. K. (2022). Applied missing data analysis. Guilford Publications.
[5] Asif, M., & Samart, K. (2022). Imputation Methods for Multiple Regression with Missing Heteroscedastic Data. Thailand statistician, 20(1), 1-15‏.‏
[6] Lamjaisue, R., Thongteeraparp, A., & Sinsomboonthong, J. (2017). Comparison of missing data estimation methods for the multiple regression analysis with missing at random dependent variable. Thammasat International Journal of Science and Technology, 25(5), 676-777.
[7] Kang, H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402-406.
[8] Pigott, T. D. (2001). A review of methods for missing data. Educational research and evaluation, 7(4), 353-383.
[9] Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate behavioral research, 31(2), 197-218.
[10] Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372-411.
[11] Sammaknejad, N., Zhao, Y., & Huang, B. (2019). A review of the expectation maximization algorithm in data-driven process identification. Journal of process control, 73, 123-136.
[12] KA, N. D., Tahir, N. M., Abd Latiff, Z. I., Jusoh, M. H., & Akimasa, Y. (2022). Missing data      imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models. Alexandria Engineering Journal, 61(1), 937-947.‏