Araştırma Makalesi
BibTex RIS Kaynak Göster
Yıl 2021, Cilt: 50 Sayı: 1, 289 - 303, 04.02.2021
https://doi.org/10.15672/hujms.734212

Öz

Kaynakça

  • [1] C. Agostinelli, A. Leung, V.J. Yohai and R.H. Zamar, Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination, Test, 24 (3), 441-461, 2015.
  • [2] F. Alqallaf, S. Van Aelst, V.J. Yohai and R.H. Zamar, Propagation of Outliers in Multivariate Data, Ann. Statist. 37 (1), 311-331, 2009.
  • [3] O. Arslan, Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression, Comput. Statist. Data Anal. 56 (6), 1952-1965, 2012.
  • [4] O. Arslan, Penalized MM regression estimation with L γ penalty: a robust version of bridge regression, Statistics 50 (6), 1236-1260, 2016.
  • [5] K.V. Branden and S. Verboven, Robust data imputation, Comput. Biol. Chem. 33 (1), 7-13, 2009.
  • [6] M. Danilov, Robust estimation of multivariate scatter in non-affine equivariant scenarios, University of British Columbia, 2010.
  • [7] M. Debruyne, S. Höppner, S. Serneels and T. Verdonck, Outlyingness: Which variables contribute most?, Stat. Comput. 29 (4), 707-723, 2019.
  • [8] J. Fan, Y. Fan and E. Barut, Adaptive robust variable selection, Ann. Statist. 42 (1), 324-351, 2014.
  • [9] A. Farcomeni, Snipping for robust k-means clustering under component-wise contamination, Stat. Comput. 24 (6), 907-919, 2014.
  • [10] P.A. Ferrari, P. Annoni, A. Barbiero and G. Manzi, An imputation method for categorical variables with application to nonlinear principal component analysis, Comput. Statist. Data Anal. 55 (7), 2410-2420, 2011.
  • [11] A.E. Hoerl and R.W. Kennard, Ridge regression Biased estimation for nonorthogonal problems, Technometrics 12 (1), 55-67, 1970.
  • [12] A. Leung, H. Zhang and R. Zamar, Robust regression estimation and inference in the presence of cellwise and casewise contamination, Comput. Statist. Data Anal. 99, 1-11, 2016.
  • [13] A. Leung, V. Yohai and R. Zamar, Multivariate location and scatter matrix estimation under cellwise and casewise contamination, Comput. Statist. Data Anal. 111, 59-76, 2017.
  • [14] J. Machkour, B. Alt, M. Muma and A.M. Zoubir, The outlier-corrected-data-adaptive Lasso: A new robust estimator for the independent contamination model, 25th European Signal Processing Conference (EUSIPCO), IEEE, 1649-1653, 2017.
  • [15] R.A. Maronna, Robust ridge regression for high-dimensional data, Technometrics 53 (1), 44-53, 2011.
  • [16] R.A. Maronna, R.D. Martin, V.J. Yohai and S.B. Matías, Robust statistics: theory and methods (with R), John Wiley & Sons, 2019.
  • [17] V. Ollerer, A. Andreas and C. Croux, The shooting S-estimator for robust regression, Comput. Statist. 31 (3), 829-844, 2016.
  • [18] J. Raymaekers and P.J. Rousseeuw, Flagging and handling cellwise outliers by robust estimation of a covariance matrix, arXiv preprint arXiv:1912.12446, 2019.
  • [19] J. Raymaekers, P.J. Rousseeuw, W. Van den Bossche and M. Hubert, cellWise: Analyzing Data with Cellwise Outliers, CRAN, R package version: 2.0.9, 2019.
  • [20] P.J. Rousseeuw and W. Van den Bossche, Detecting deviating data cells, Technometrics 60 (2), 135-145, 2018.
  • [21] P.J. Rousseeuw and A. M. Leroy, Robust regression and outlier detection, John Wiley & Sons, 2005.
  • [22] N. Simon, J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for Coxs proportional hazards model via coordinate descent, J. Stat. Softw. 39 (5), 1-13, 2011.
  • [23] T.A. Stamey, J.N. Kabalin, J.E. McNeal, I. Johnstone, M. Iain, F. Freiha, E.A. Redwine and N. Yang, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients, J. Urol. 141 (5), 1076-1083, 1989.
  • [24] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 (1), 267-288, 1996.
  • [25] A. Unwin, Multivariate outliers and the O3 Plot, J. Comput. Graph. Statist. 28 (3), 635-643, 2019.
  • [26] S. Verboven, K.V. Branden and P. Goos, Sequential imputation for missing values, Comput. Biol. Chem. 33 (5-6), 320-327, 2007.
  • [27] H. Xu, C. Caramanis and S. Mannor, Robust regression and LASSO, Adv Neural Inf Process Syst, 1801-1808, 2009.
  • [28] C. Yi and J. Huang, Semismooth newton coordinate descent algorithm for elastic-net penalized huber loss regression and quantile regression, J. Comput. Graph. Statist. 26 (3), 547-557, 2017.
  • [29] J.V. Yohai, High breakdown-point and high efficiency robust estimates for regression, Ann. Statist. 15 (2), 642-656, 1987.
  • [30] L. Zeng and J. Xie, Regularization and variable selection for data with interdependent structures, 2008.
  • [31] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 (2), 301-320, 2005.

Robust regression estimation and variable selection when cellwise and casewise outliers are present

Yıl 2021, Cilt: 50 Sayı: 1, 289 - 303, 04.02.2021
https://doi.org/10.15672/hujms.734212

Öz

Two main issues regarding a regression analysis are estimation and variable selection in presence of outliers. Popular robust regression estimation methods are combined with variable selection methods to simultaneously achieve robust estimation and variable selection. However, recent works showed that the robust estimation methods used in those estimation and variable selection procedures are only resistant to the casewise (rowwise) outliers in the data. Therefore, since these robust variable selection methods may not be able to cope with cellwise outliers in the data, some extra care should be taken when cellwise outliers are present along with the casewise outliers. In this study, we proposed a robust estimation and variable selection method to deal with both cellwise and casewise outliers in the data. The proposed method has three steps. In the first step, cellwise outliers were identified, deleted and marked with NA sign in each explanatory variable. In the second step, the cells with NA signs were imputed using a robust imputation method. In the last step, robust regression estimation methods were combined with the variable selection method LASSO (Least Angle Solution and Selection Operator) to estimate the regression parameters and to select remarkable explanatory variables. The simulation results and real data example revealed that the proposed estimation and variable selection procedure perform well in the presence of cellwise and casewise outliers.

Kaynakça

  • [1] C. Agostinelli, A. Leung, V.J. Yohai and R.H. Zamar, Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination, Test, 24 (3), 441-461, 2015.
  • [2] F. Alqallaf, S. Van Aelst, V.J. Yohai and R.H. Zamar, Propagation of Outliers in Multivariate Data, Ann. Statist. 37 (1), 311-331, 2009.
  • [3] O. Arslan, Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression, Comput. Statist. Data Anal. 56 (6), 1952-1965, 2012.
  • [4] O. Arslan, Penalized MM regression estimation with L γ penalty: a robust version of bridge regression, Statistics 50 (6), 1236-1260, 2016.
  • [5] K.V. Branden and S. Verboven, Robust data imputation, Comput. Biol. Chem. 33 (1), 7-13, 2009.
  • [6] M. Danilov, Robust estimation of multivariate scatter in non-affine equivariant scenarios, University of British Columbia, 2010.
  • [7] M. Debruyne, S. Höppner, S. Serneels and T. Verdonck, Outlyingness: Which variables contribute most?, Stat. Comput. 29 (4), 707-723, 2019.
  • [8] J. Fan, Y. Fan and E. Barut, Adaptive robust variable selection, Ann. Statist. 42 (1), 324-351, 2014.
  • [9] A. Farcomeni, Snipping for robust k-means clustering under component-wise contamination, Stat. Comput. 24 (6), 907-919, 2014.
  • [10] P.A. Ferrari, P. Annoni, A. Barbiero and G. Manzi, An imputation method for categorical variables with application to nonlinear principal component analysis, Comput. Statist. Data Anal. 55 (7), 2410-2420, 2011.
  • [11] A.E. Hoerl and R.W. Kennard, Ridge regression Biased estimation for nonorthogonal problems, Technometrics 12 (1), 55-67, 1970.
  • [12] A. Leung, H. Zhang and R. Zamar, Robust regression estimation and inference in the presence of cellwise and casewise contamination, Comput. Statist. Data Anal. 99, 1-11, 2016.
  • [13] A. Leung, V. Yohai and R. Zamar, Multivariate location and scatter matrix estimation under cellwise and casewise contamination, Comput. Statist. Data Anal. 111, 59-76, 2017.
  • [14] J. Machkour, B. Alt, M. Muma and A.M. Zoubir, The outlier-corrected-data-adaptive Lasso: A new robust estimator for the independent contamination model, 25th European Signal Processing Conference (EUSIPCO), IEEE, 1649-1653, 2017.
  • [15] R.A. Maronna, Robust ridge regression for high-dimensional data, Technometrics 53 (1), 44-53, 2011.
  • [16] R.A. Maronna, R.D. Martin, V.J. Yohai and S.B. Matías, Robust statistics: theory and methods (with R), John Wiley & Sons, 2019.
  • [17] V. Ollerer, A. Andreas and C. Croux, The shooting S-estimator for robust regression, Comput. Statist. 31 (3), 829-844, 2016.
  • [18] J. Raymaekers and P.J. Rousseeuw, Flagging and handling cellwise outliers by robust estimation of a covariance matrix, arXiv preprint arXiv:1912.12446, 2019.
  • [19] J. Raymaekers, P.J. Rousseeuw, W. Van den Bossche and M. Hubert, cellWise: Analyzing Data with Cellwise Outliers, CRAN, R package version: 2.0.9, 2019.
  • [20] P.J. Rousseeuw and W. Van den Bossche, Detecting deviating data cells, Technometrics 60 (2), 135-145, 2018.
  • [21] P.J. Rousseeuw and A. M. Leroy, Robust regression and outlier detection, John Wiley & Sons, 2005.
  • [22] N. Simon, J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for Coxs proportional hazards model via coordinate descent, J. Stat. Softw. 39 (5), 1-13, 2011.
  • [23] T.A. Stamey, J.N. Kabalin, J.E. McNeal, I. Johnstone, M. Iain, F. Freiha, E.A. Redwine and N. Yang, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients, J. Urol. 141 (5), 1076-1083, 1989.
  • [24] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 (1), 267-288, 1996.
  • [25] A. Unwin, Multivariate outliers and the O3 Plot, J. Comput. Graph. Statist. 28 (3), 635-643, 2019.
  • [26] S. Verboven, K.V. Branden and P. Goos, Sequential imputation for missing values, Comput. Biol. Chem. 33 (5-6), 320-327, 2007.
  • [27] H. Xu, C. Caramanis and S. Mannor, Robust regression and LASSO, Adv Neural Inf Process Syst, 1801-1808, 2009.
  • [28] C. Yi and J. Huang, Semismooth newton coordinate descent algorithm for elastic-net penalized huber loss regression and quantile regression, J. Comput. Graph. Statist. 26 (3), 547-557, 2017.
  • [29] J.V. Yohai, High breakdown-point and high efficiency robust estimates for regression, Ann. Statist. 15 (2), 642-656, 1987.
  • [30] L. Zeng and J. Xie, Regularization and variable selection for data with interdependent structures, 2008.
  • [31] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 (2), 301-320, 2005.
Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular İstatistik
Bölüm İstatistik
Yazarlar

Onur Toka 0000-0002-4025-4537

Meral Çetin 0000-0003-0247-7120

Olcay Arslan 0000-0002-7067-4997

Yayımlanma Tarihi 4 Şubat 2021
Yayımlandığı Sayı Yıl 2021 Cilt: 50 Sayı: 1

Kaynak Göster

APA Toka, O., Çetin, M., & Arslan, O. (2021). Robust regression estimation and variable selection when cellwise and casewise outliers are present. Hacettepe Journal of Mathematics and Statistics, 50(1), 289-303. https://doi.org/10.15672/hujms.734212
AMA Toka O, Çetin M, Arslan O. Robust regression estimation and variable selection when cellwise and casewise outliers are present. Hacettepe Journal of Mathematics and Statistics. Şubat 2021;50(1):289-303. doi:10.15672/hujms.734212
Chicago Toka, Onur, Meral Çetin, ve Olcay Arslan. “Robust Regression Estimation and Variable Selection When Cellwise and Casewise Outliers Are Present”. Hacettepe Journal of Mathematics and Statistics 50, sy. 1 (Şubat 2021): 289-303. https://doi.org/10.15672/hujms.734212.
EndNote Toka O, Çetin M, Arslan O (01 Şubat 2021) Robust regression estimation and variable selection when cellwise and casewise outliers are present. Hacettepe Journal of Mathematics and Statistics 50 1 289–303.
IEEE O. Toka, M. Çetin, ve O. Arslan, “Robust regression estimation and variable selection when cellwise and casewise outliers are present”, Hacettepe Journal of Mathematics and Statistics, c. 50, sy. 1, ss. 289–303, 2021, doi: 10.15672/hujms.734212.
ISNAD Toka, Onur vd. “Robust Regression Estimation and Variable Selection When Cellwise and Casewise Outliers Are Present”. Hacettepe Journal of Mathematics and Statistics 50/1 (Şubat 2021), 289-303. https://doi.org/10.15672/hujms.734212.
JAMA Toka O, Çetin M, Arslan O. Robust regression estimation and variable selection when cellwise and casewise outliers are present. Hacettepe Journal of Mathematics and Statistics. 2021;50:289–303.
MLA Toka, Onur vd. “Robust Regression Estimation and Variable Selection When Cellwise and Casewise Outliers Are Present”. Hacettepe Journal of Mathematics and Statistics, c. 50, sy. 1, 2021, ss. 289-03, doi:10.15672/hujms.734212.
Vancouver Toka O, Çetin M, Arslan O. Robust regression estimation and variable selection when cellwise and casewise outliers are present. Hacettepe Journal of Mathematics and Statistics. 2021;50(1):289-303.