COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021 (SCI-Expanded)
Missing data is common in survival analysis. It is either removed or imputed using various methods. Expectation-maximization (EM) imputation is a popular method in Cox regression studies. This paper investigated the effect of different regression methods on Cox regression modeling within the framework of EM. A stratified Cox regression model was derived from a dataset of categorical and numerical variables. Missing data were imputed using the EM framework with five machine learning algorithms and then were compared to the full model. The results show that the recursive partition and regression tree (RPART) method performed better than others. However, all regression methods performed poorly in categorical covariate imputation. R code is available online.