随机缺失数据下样本分位数估计
Sample Quantile Estimation for Ignorable Nonresponse
分位数的估计在生物医学、社会经济调查等领域有着广泛的应用,然而在实际问题的研究 中,往往由于各种人为或不可控因素造成数据收集不完全. 本文在随机缺失(MAR)假设条件下,利用非参数核补法和局部多重插补法给出了响应变量缺失时样本分位数的估计,并利用经验过程等理论证明了由这两种方法得到的分位数估计的大样本性质,同时,使用重抽样方法给出了估计的 渐近方差的估计,模拟结果验证了这两种方法的有效性. 文章所提两种方法的优点在于:首先,所提出的缺失修正方法不需要对缺失概率的模型做任何假设; 其次,方法亦适用于其他有关参数 不可微的估计目标函数; 最后,方法很容易地推广到一般M估计的情况,并可以对多个分位数同时进行估计.
Quantile estimation is widely used in clinical trials, social statistics and economics. In practise, complete data are often not available for every subject due to many reasons. In this article, we study the estimation of sample quantiles of response under missing at random assumption. We use noparametric kernel regression imputation method and local multiple imputation method to estimate sample quantiles. Asymptotic properties are also established and a revised bootstrap method is proposed to estimate the asymptotic variance of the two estimators. Simulation studies are reported to assess the finite sample properties of the proposed estimators. The merit of our methods are that, firstly, we don't need to give any assumptions on the missing response model; secondly, our method can deal with other non-differentiable estimation functions; finally, our method can be extended to solve other M estimator, and can estimate several quantiles simultaneously.
随机缺失 / 样本分位数 / 估计方程 / 经验过程 / 非参数核回归 / 局部多重插补法 {{custom_keyword}} /
missing at random / sample quantile / estimating equation / empirical process / kernel regression imputation method / local multiple imputation {{custom_keyword}} /
[1] Aerts M., Claeskens G., Hens N., et al., Local multiple imputation, Biometrika, 2002, 89: 375-388.
[2] Birhanu T., Molenberghs G., Sotto C., et al., Doubly robust and multiple-imputation-sased generalized estimating equations, Journal of Biopharmaceutical Statistics, 2011, 21: 202-225.
[3] Cheng P., Chu C., Kernel estimation of distribution functions and quantiles with missing data, Statistica Sinica, 1996, 6: 63-78.
[4] Cheng P. E., Nonparametric estimation of mean functionals with data missing at random, Journal of the American Statistical Association, 1994, 89: 81-87.
[5] Fan J., Local linear regression smoothers and their minimax efficiencies, The Annals of Statistics, 1993, 21: 196-216.
[6] Godambe V. P., Estimating Functions, U. K.: Oxford University Press, Oxford, 1991.
[7] Hall P., Presnell B., Intentionally biased bootstrap methods, Journal of the Royal Statistical Society. Ser. B, 1999, 61: 143-158.
[8] Hardin J., Hilbe J., Generalized Estimating Equations, Boca Raton, FL: Chapman and Hall/CRC, 2003.
[9] Ibrahim J., Lipsitz S., Chen M., Missing covariates in generalized Llnear models when the missing data mechanism is nonignorable, Journal of the Royal Statistical Society, Ser. B, 1999, 61: 173-190.
[10] Kim J., Yu C., A semiparametric estimation of mean functionals with nonignorable missing data, Journal of the American Statistical Association, 2011, 106: 157-165.
[11] Kosorok M., Introduction to Empirical Processes and Semiparametric Inference, Springer-Verlag, New York, 2008.
[12] Lawless J., Kalbfleisch J., Wild C., Semiparametric methods for response-selective and missing data problems in regression, Journal of the Royal Statistical Society, Ser. B, 1999, 61: 413-438.
[14] Liang H., Zhou Y., Semiparametric inference for ROC curves with censoring, Scandinavian Journal of Statistics, 2008, 35: 212-227.
[14] Liang K., Zeger S., Inference based on estimating functions in the presence of nuisance parameters (with discussion), Statistical Science, 1995, 10: 158-172.
[15] Lipsitz S., Zhao L., Molenberghs G., A semiparametric method of multiple imputation, Journal of the Royal Statistical Society, Ser. B, 1998, 60: 127-144.
[16] Little R., Inference about means for incomplete multivariate data, Biometrika, 1976, 63: 593-604.
[17] Little R., Rubin D., Statistical Analysis With Missing data (2nd ed.), Wiley, New York, 2002.
[18] Paik M. C., Multiple imputation for the Cox proportional hazards model with missing covariates, Lifetime data Analysis, 1997, 3: 289-298.
[19] Paik M. C., The generalized estimating equation approach when data are not missing completely at random, Journal of the American Statistical Association, 1997, 92: 1320-1329.
[20] Rubin D., Inference and missing data, Biometrika, 1976a, 63: 581-590.
[21] Rubin D., Multiple Imputation for Nonresponse in Surveys, Wiley, New York, 1987.
[22] Robins J., Hsieh F., Newey W., Semiparametric estimation of a conditional density with missing or mismeasured covariates, Journal of the Royal Statistical Society, Ser. B, 1995, 57: 409-424.
[23] Robins J., Rotnitzky A., Semiparametric efficiency in multivariate regression models with missing data, Journal of the American Statistical Association, 1995, 90: 122-129.
[24] Robins J., Rotnitzky A., Zhao L., Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association, 1994, 89: 846-866.
[25] Robins J., Rotnitzky A., Zhao L., Analysis of semiparametric regression Mmdels for repeated outcomes in the presence of missing data, Journal of the American Statistical Association, 1995, 90: 106-121.
[26] Rotnitzky A., Robins J., Scharfstein D., Semiparametric regression for repeated outcomes with non-ignorable non-response, Journal of the American Statistical Association, 1998, 93: 1321-1339.
[27] Schenker N., Welsh A., Asymptotic results for multiple imputation, Annals of Statistics, 1988, 16: 1550- 1566.
[28] Scott A., Wild C., Fitting logistic regression models in case-control studies with complex sampling, Applied Statistics, 2001, 50: 389-401.
[29] Scott A., Wild C., On the robustness of weighted methods for fitting models to case-control data, Journal of the Royal Statistical Society, Ser. B, 2002, 64: 207-219.
[30] Sepanski J. H., Knickerbocker R., Carroll R. J., ASemiparametric correction for attenuation, Journal of the American Statistical Association, 1994, 89: 1366-1373.
[31] Shorack G., Wellner J., Empirical Processes with Applications to Statistics, Society for Industrial Mathematics, Wiley, New York, 2009.
[32] Titterington D., Sedransk J., Imputation of missing values using density estimation, Statist. Prob. Lett., 1989, 8: 411-418.
[33] Wang Q., Rao J., Emperical likelihood-based inference under imputation for missing response data, The Annals of Statistics, 2002, 30: 896-924.
[34] Wei Y., Ma Y., Carroll R., Multiple imputation in quantile regression, Biometrika, 2012, 99: 423-438.
[35] Xia Y., Li W., On Single-index coefficient regression models, Journal of the American Statistical Association, 1999, 94: 1275-1285.
[36] Xie F., Paik M., General estimating equation model for binary outcomes with missing covariates, Biometrics, 1997, 53: 1458-1466.
[37] Yuan Y., Shu X., Liu X., Quantile estimation with missing data, 2008(manuscript).
[38] Zeng D., Lin D. Y., Efficient resampling methods for nonsmooth estimating functions, Biostatistics, 2008, 9: 355-363.
[39] Zhao L., Lipsitz S., Designs and analysis of two-stage studies, Statistics in Medicine, 1992, 11: 769-782.
[40] Zhou Y., Wan A., Wang X., Estimating equations inference with missing data, Journal of the American Statistical Association, 2008, 103: 1187-1198.
[41] Zhu H., Ibrahim J., Tang N., Bayesian sensitivity analysis of statistical models with missing data, Statistica Sinica, 2014, 24: 871-896.
国家自然科学基金重点项目(71331006)和国家自然科学重大研究计划重点项目(91546202);中国科学院重点实验室(2008DP173182);国家数学与交叉科学中心(2008DP173182);上海财经大学创新团队支持计划(IRTSHUFE13122402)资助张莉获得国家自然科学基金青年项目(11601424);教育部青年基金项目(15YJC910009);博士后科学基金面上项目(2015M580867);博士后科学基金特别资助(2016T90940);西北大学自然科学基金项目(14NW31)资助
/
〈 | 〉 |