Comparative Study of Missing Data Imputation Methods in Functional Data Analysis

Sozen, Caglar; Sağlam, Fatih; Sozen, Mervenur

doi:10.1007/s40995-025-01858-2

Comparative Study of Missing Data Imputation Methods in Functional Data Analysis

Sozen C., Sağlam F., Sozen M.

IRANIAN JOURNAL OF SCIENCE, cilt.50, sa.1, ss.149-162, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 50 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1007/s40995-025-01858-2
Dergi Adı: IRANIAN JOURNAL OF SCIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.149-162
Anahtar Kelimeler: Basis function, Data imputation, Functional data, Functional data analysis, Missing data
Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

Recent technological advancements have enabled the analysis of high-dimensional data, where each data point is assumed to represent a sample from an underlying continuous function. Functional data analysis (FDA) is a method developed to study these underlying functional forms. Missing data is commonly encountered in FDA, yet imputation methods tailored to functional data remain an underexplored area. This study investigates the impact of various missing data imputation methods on functional data by sampling missing values from two datasets: the daily average temperature of 18 cities in Turkey's Black Sea region and the stock values traded in Borsa Istanbul. A Fourier basis function approach was used for the periodic temperature data, while a B-Spline basis function approach was applied to the non-periodic stock data. Using multiple imputation methods, including MI Amelia, MICE Random Forest, and Kalman filtering, the missing data were estimated, and each method's performance was evaluated through multiple comparison tests. Findings reveal significant performance variations across imputation methods depending on the missing data rate, with certain methods consistently outperforming others. This study provides a comparative analysis, offering valuable insights for selecting appropriate imputation methods in FDA based on data structure and missing rate.