Abrahamowicz M, du Berger R, Grover SA. Flexible modelling of the effects of serum cholesterol on coronary heart disease mortality. Am J Epidemiol. 1997;145:714–29.
Article
CAS
PubMed
Google Scholar
Altman DG, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Stat Med. 1989;8:771–83.
Article
CAS
PubMed
Google Scholar
Altman DG, Lausen B, Sauerbrei W, Schumacher M. The dangers of using ‘optimal’cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst. 1994;86:829–35.
Article
CAS
PubMed
Google Scholar
Altman DG, McShane LM, Sauerbrei W, Taube SE. Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration. PLoS Med. 2012;9:e1001216.
Article
PubMed
PubMed Central
Google Scholar
Antoniadis A, Gijbels I, Verhasselt A. Variable selection in additive models using P-splines. Technometrics. 2012;54:425–38.
Article
Google Scholar
Arem H, Moore SC, Patel A, Hartge P, Berrington DE, Gonzalez A, Visvanathan K, Campbell PT, Freedman M, Weiderpass E, Adami HO, Linet MS, Lee IM, Matthews CE. Leisure Time physical activity and mortality. A detailed pooled analysis of the dose-response relationship. JAMA Intern Med. 2015;175:959–67.
Article
PubMed
PubMed Central
Google Scholar
Augustin N, Sauerbrei W, Schumacher M. The practical utility of incorporating model selection uncertainty into prognostic models for survival data. Stat Model. 2015;5:95–118.
Article
Google Scholar
Becher H. Analysis of continuous covariates and dose-effect analysis. In: Ahrens W, Pigeot I (Eds) Handbook of epidemiology. 2nd edition. Heidelberg: Springer Verlag; 2014.
Becher H, Lorenz E, Royston P, Sauerbrei W. Analysing covariates with spike at zero. a modified FP procedure and conceptual issues. Biometrical J. 2012;54:686–700.
Article
Google Scholar
Benedetti A, Abrahamowicz M. Using generalized additive models to reduce residual confounding. Stat Med. 2004;23:3781–801.
Article
PubMed
Google Scholar
Binder H, Schumacher M. Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinform. 2008;9:14.
Article
Google Scholar
Binder H, Sauerbrei W, Royston P. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response. Stat Med. 2013;32:2262–77.
Article
PubMed
Google Scholar
Boulesteix AL, Binder H, Abrahamowicz M, Sauerbrei W. On the necessity and design of studies comparing statistical methods. Biometrical J. 2018;60:216–8.
Article
Google Scholar
Breiman L. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc. 1992;87:738–54.
Article
Google Scholar
Breiman L. Better subset regression using the non- negative Garrote. Technometrics. 1995;37:373–84.
Article
Google Scholar
Breiman L. Statistical Modeling: The two cultures. Stat Sci. 2001;16:199–231.
Article
Google Scholar
Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics. 1997;53:603–18.
Article
Google Scholar
Bühlmann P. Hothorn. Boosting algorithms: regularization, prediction and model fitting. Stat Sci. 2007;22:477–505.
Article
Google Scholar
Burnham KP, Anderson DR. Model selection and multimodel inference: a practical information- theoretic approach. New York: Springer; 2002.
Google Scholar
Bursac Z, Gauss CH, Williams DK, Hosmer DW. Purposeful selection of variables in logistic regression. Source Code Biol Med. 2008;3:17.
Article
PubMed
PubMed Central
Google Scholar
Chatfield C. Model uncertainty, data mining and statistical inference (with discussion). J Royal Stat Soc Series B. 1995;158:419–66.
Article
Google Scholar
Chatield C. Confessions of a pragmatic statistician. Statistician. 2002;51:1–20.
Google Scholar
Chen C, George SL. The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat Med. 1985;4:39–46.
Article
CAS
PubMed
Google Scholar
Chouldechova A, Hastie T. Generalized additive model selection. arXiv preprint 2015;arXiv:1506.03850.
Copas JB, Long T. Estimating the residual variance in orthogonal regression with variable selection. Journal of the Royal Statistical Society. Series D (The Statistician). 1991;40:51-59.
Cox DR. Comment on Breiman, L. (2001). Statistical modeling: the two cultures. Stat Sci. 2001;16:216–8.
Google Scholar
Dakna M, Harris K, Kalousi A, Carpentier S, Kolch W, Schanstra JP, Haubitz M, Vlahou A, Mischak H, Girolami M. Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinform. 2010;11:594.
Article
Google Scholar
de Bin R, Sauerbrei W. Handling co-dependence issues in resampling-based variable selection procedures: a simulation study. J Stat Comput Simul. 2018:8828–55.
de Bin R, Janitza S, Sauerbrei W, Boulesteix AL. Subsampling versus bootstrapping in resampling-based model selection for multivariable regression. Biometrics. 2016;72:272–80.
Article
PubMed
Google Scholar
de Boor C. A practical guide to splines revised. Revised Edition. New York: Springer; 2001.
Google Scholar
Dorie V, Hill J, Shalit U, Scott M, Cervone D. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. Stat Sci. 2019;34:43–68.
Article
Google Scholar
Draper D. Assessment and propagation of model selection uncertainty (with) discussion. J Royal Stat Soc Series B. 1995;57:45–97.
Google Scholar
Dunkler D, Plischke M, Leffondré K, Heinze G. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PLoS ONE. 2014;9:e113677.
Article
CAS
PubMed
PubMed Central
Google Scholar
Dunkler D, Sauerbrei W, Heinze G. Global, parameterwise and joint shrinkage factor estimation. J Stat Softw. 2016;69:1–19.
Article
Google Scholar
Efron B. Comment on Breiman, L. (2001). Statistical modeling: the two cultures. Stat Sci. 2001;16:218–9.
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist. 2004;32:407–99.
Article
Google Scholar
Efroymson MA. Multiple regression analysis. in: Ralston A and Wilf HS(ed.). Mathematical methods for digital computers. John Wiley. New York; 1960.
Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties (with comments and rejoinder). Stat Sci. 1996;11:89–121.
Article
Google Scholar
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96:1348–60.
Article
Google Scholar
Freund Y, Schapire R. Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: Morgan Kaufmann Publishers Inc; 1996.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–232.
Article
Google Scholar
Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat. 2000;28:337–407.
Article
Google Scholar
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.
Article
PubMed
PubMed Central
Google Scholar
Fröhlich H. Including network knowledge into Cox regression models for biomarker signature discovery. Biometrical J. 2014;56:287–306.
Article
Google Scholar
Gong G. Some ideas on using the bootstrap in assessing model variability. In: Heiner KW, Sacher RS, Wilkinson JW, editors. Computer Science and Statistics: Proceedings of the 14th Symposium on the Interface. NewYork: Springer; 1982.
Google Scholar
Good DM, Zürbig P, Argilés A, Bauer HW, Behrens G, Coon JJ, Dakna M, Decramer S, Delles C, Dominiczak AF, Ehrich JHH. Naturally occurring human urinary peptides for use in diagnosis of chronic kidney disease. Mol Cell Proteomic. 2010;9:2424–37.
Article
Google Scholar
Greenland S. Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology. 1995;6:450–4.
Article
CAS
PubMed
Google Scholar
Groenwold RHH, Klungel OH, van der Graaf Y, Hoes AW, Moons KGM. Adjustment for continuous confounders: an example of how to prevent residual confounding. Can Med Assoc J. 2013;185:401–6.
Article
Google Scholar
Harrell FE. Regression modeling strategies. In: With applications to linear models, logistic and ordinal regression, and survival analysis. New York: Springer; 2001.
Google Scholar
Harrell FE. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. 2nd ed. New York: Springer; 2015.
Book
Google Scholar
Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.
Article
PubMed
Google Scholar
Hastie T, Tibshirani R. Generalized additive models. New York: Chapman & Hall/CRC; 1990.
Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. 2nd ed. New York: Springer; 2009.
Book
Google Scholar
Hastie T, Tibshirani R, Wainwright M. Statistical learning with Sparsity: The lasso and generalizations. CRC Press LLC: Boca Raton. Monographs on statistics and applied probability; 2015.
Heinze G, Dunkler D. Five myths about variable selection. Transplant Int. 2017;30:6–10.
Article
Google Scholar
Heinze G, Wallisch C, Dunkler D. Variable selection – a review and recommendations for the practicing statistician. Biometrical J. 2018;60:431–49.
Article
Google Scholar
Hilsenbeck SG, Clark GM, Mcguire W. Why do so many prognostic factors fail to pan out? Breast Cancer Res Treat. 1992;22:197–206.
Article
CAS
PubMed
Google Scholar
Hoerl AE, Kennard RW. Ridge Regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12:55–67.
Article
Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999;14:382–417.
Article
Google Scholar
Hofner B, Hothorn T, Kneib T, Schmid M. A framework for unbiased model selection based on boosting. J Comput Graphical Stat. 2011;20:956–71.
Article
Google Scholar
Hosmer D, Lemeshow S, May S. Applied survival analysis (2nd ed.). Wiley. Hoboken, NJ; 2008.
Hosmer D, Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. Hoboken: Wiley; 2013.
Book
Google Scholar
Huebner M, le Cessie S, Schmidt C, Vach W, On behalf of the Topic Group “Initial Data Analysis” of the STRATOS Initiative. A Contemporary Conceptual Framework for Initial Data Analysis. Observational Studies. 2018;4:171–92.
Google Scholar
Janitza S, Binder H, Boulesteix AL. Pitfalls of hypothesis tests and model selection on boot- strap samples: causes and consequences in biometrical applications. Biometrical J. 2016;58:447–73.
Article
Google Scholar
Jenkner C, Lorenz E, Becher H, Sauerbrei W. Modeling continuous covariates with a ‘spike‘at zero: bivariate approaches. Biometrical J. 2016;58:783–96.
Article
Google Scholar
Lee PH. Is a cutoff of 10% appropriate for the change-in-estimate criterion of confounder identification? J Epidemiol. 2014;24:161–7.
Article
PubMed
PubMed Central
Google Scholar
Leeb H, Pötscher BM. Model selection and inference: facts and fiction. Econometric Theory. 2005;21:21–59.
Article
Google Scholar
Leffondre K, Abrahamowicz M, Siemiatycki J, Rachet B. Modeling smoking history: a comparison of different approaches. Am J Epidemiol. 2002;156:813–23.
Article
PubMed
Google Scholar
Lin Y, Zhang HH. Component selection and smoothing in multivariate nonparametric American Journal of Epidemiology regression. Ann Stat. 2006;34:2272–97.
Article
Google Scholar
Lorenz E, Jenkner C, Sauerbrei W, Becher H. Modeling variables with a spike at zero. Examples and practical recommendations. Am J Epidemiol. 2017;185:1–39.
Article
Google Scholar
Maldonado G, Greenland S. Simulation of confounder-selection strategies. Am J Epidemiol. 1993;138:923–36.
Article
CAS
PubMed
Google Scholar
Mallows CL. The zeroth problem. Am Stat. 1998;52:1–9.
Google Scholar
Mantel N. Why stepdown procedures in variable selection? Technometrics. 1970;12:621–5.
Article
Google Scholar
Marcus R, Peritz E, Gabriel KR. On closed test procedures with special reference toordered analysis of variance. Biometrika. 1976;76:655–60.
Article
Google Scholar
Marra G, Wood SN. Practical variable selection for generalized additive models. Comput Stat Data Anal. 2011;55:2372–87.
Article
Google Scholar
Mayr A, Binder H, Gefeller O, Schmid M. The Evolution of boosting algorithms – from machine learning to statistical modelling. Methods Inf Med. 2014;53:419–27.
Article
CAS
PubMed
Google Scholar
Meier L, van de Geer S, Bühlmann P. High-dimensional additive modeling. Ann Stat. 2009;37:3779–821.
Article
Google Scholar
Meinshausen N, Bühlmann P. Stability selection. J Stat Soc Series B Stat Methodol. 2010;72:417–73.
Article
Google Scholar
Miller A. Selection of subsets of regression variables. Journal of the Royal Statistical Society. Series A (General). 1984;147:389–425.
Article
Google Scholar
Miller R, Siegmund D. Maximally selected chi-square statistics. Biometrics. 1982;38:1011–6.
Article
Google Scholar
Moons KG, Altman KG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GGS. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann Intern Med. 2015;162:W1–W73.
Article
PubMed
Google Scholar
Morris T, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38:2074–102.
Article
PubMed
PubMed Central
Google Scholar
Nkuipou-Kenfack E, Zürbig P, Mischak H. The long path towards implementation of clinical proteomics: exemplified based on CKD273. Proteomics Clin Appl. 2017;11:5–6.
Google Scholar
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M, on behalf of TG2 of the STRATOS initiative. A review of spline function procedures in R. BMC Med Res Methodol. 2019;19:46.
Article
PubMed
PubMed Central
Google Scholar
Picard RP, Cook RD. Cross-validation of regression models. J Am Stat Assoc. 1984;79:575–83.
Article
Google Scholar
Pullenayegum EM, Platt RW, Barwick M, Feldman BM, Offringa M, Thabane L. Knowledge translation in biostatistics: a survey of current practices, preferences, and barriers to the dissemination and uptake of new statistical methods. Stat Med. 2015;35:805–18.
Article
PubMed
Google Scholar
Raftery AE. Bayesian model selection in social research. Sociol Methodol. 1995;25:111–63.
Article
Google Scholar
Ramaiola I, Padró T, Peña E, Juan-Babot O, Cubedo J, Martin-Yuste V, Sabate M, Badimon L. Changes in thrombus composition and profilin-1 release in acute myocardial infarction. Eur Heart J. 2015;36:965–75.
Article
CAS
PubMed
Google Scholar
Ramsay JO. Monotone regression splines in action. Stat Sci. 1988;3:425–41.
Article
Google Scholar
Ravikumar P, Liu H, Lafferty J, Wasserman L. Spam. Sparse additive models. In Advances in Neural Information Processing Systems. Vol. 20 (eds J. Platt, D. Koller, Y. Singer S. Roweis). Cambridge, MIT Press; 2008.
Rosenberg PS, Katki H, Swanson CA, Brown LM, Wacholder S, Hoover RN. Quantifying epidemiologic risk factors using non-parametric regression: model selection remains the greatest challenge. Stat Med. 2003;22:3369–81.
Article
PubMed
Google Scholar
Rospleszcz S, Janitza S, Boulesteix AL. Categorical variables with many categories are preferentially selected in bootstrap-based model selection procedures for multivariable regression models. Biometrical J. 2016;58:652–73.
Article
Google Scholar
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat. 1994;43:429–67.
Article
Google Scholar
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25:127–41.
Article
PubMed
Google Scholar
Royston P, Sauerbrei W. Multivariable modelling with cubic regression splines: a principled approach. Stata J. 2007;7:45–70.
Article
Google Scholar
Royston P, Sauerbrei W. Multivariable model-building. a pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley, Chichester; 2008.
Sauerbrei W. The use of resampling methods to simplify regression models in medical statistics. Appl Stat. 1999;48:313–29.
Google Scholar
Sauerbrei W, Abrahamowicz M, Altman DG, le Cessie S, Carpenter J, on behalf of the STRATOS initiative. STRengthening Analytical Thinking for Observational Studies: The STRATOS initiative. Stat Med. 2014;33:5413–32.
Article
PubMed
PubMed Central
Google Scholar
Sauerbrei W, Buchholz A, Boulesteix AL, Binder H. On stability issues in deriving multivariable regression models. Biometrical J. 2015:57531–55.
Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. Multivariable regression model building by using fractional polynomials: description of SAS, STATA and R programs. Comput Stat Data Anal. 2006;50:3464–85.
Article
Google Scholar
Sauerbrei W, Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J Royal Stat Soc A. 1999;162:71–94.
Article
Google Scholar
Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med. 2007;26:5512–28.
Article
PubMed
Google Scholar
Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: application to the cox regression model. Stat Med. 1992;11:2093–109.
Article
CAS
PubMed
Google Scholar
Schmid M, Hothorn T. Boosting additive models using componentwise P-splines. Comput Stat Data Anal. 2008;53:298–311.
Article
Google Scholar
Shaw PA, Deffner V, Dodd KW, Freedman LS, Keogh R, Kipnis V, Küchenhoff H, Tooze JA, on behalf of Measurement Error Working group (TG4) of the STRATOS initiative. Epidemiological analyses with error prone exposures: review of current practise and recommendations. Ann Epidemiol. 2018;28:82–828.
Article
Google Scholar
Shmueli G. To explain or to predict? Stat Sci. 2010;25:289–310.
Article
Google Scholar
Smith GCS, Seaman SR, Wood AM, Royston P, White IR. Correcting for Optimistic Prediction in Small Data Sets. Am J Epidemiol. 2014;180:318–24.
Article
PubMed
PubMed Central
Google Scholar
Steiner M, Kim Y. The Mechanics of omitted variable bias: bias amplification and cancellation of offsetting biases. J Causal Inference. 2016;4:20160009.
Article
PubMed
PubMed Central
Google Scholar
Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49:907–16.
Article
CAS
PubMed
Google Scholar
Taylor J, Tibshirani RJ. Statistical learning and selective inference. Proc Natl Acad Sci USA. 2015;112:7629–34.
Article
CAS
PubMed
PubMed Central
Google Scholar
Teräsvirta T, Mellin I. Model selection criteria and model selection tests in regression models. Scand J Stat. 1986;13:159–71.
Google Scholar
Tibshirani R. Regression shrinkage and selection via the Lasso. J Royal Stat Soc Series B Methodol. 1996;58:267–88.
Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. J Royal Stat Soc Series B. 2011;73:273–82.
Article
Google Scholar
Tibshirani R, Taylor J, Loftus J, Reid S. Selective inference: tools for selective inference. Proc Natl Acad Sci USA. 2017;112:7629–34.
Google Scholar
Tutz G, Binder H. Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics. 2016;62:961–71.
Article
Google Scholar
van Houwelingen HC. From model building to validation and back: a plea for robustness. Stat Med. 2014;33:5223–38.
Article
PubMed
Google Scholar
van Houwelingen HC, Sauerbrei W. Cross-validation, shrinkage and variable selection in linear regression revisited. Open J Stat. 2013;3:79–102.
Article
Google Scholar
van Houwelingen JC, le Cessie S. Predictive value of statistical models. Stat Med. 1990;9:1303–25.
Article
PubMed
Google Scholar
van Walraven C, Hart RG. Leave ‘em alone - why continuous variables should be analyzed as such. Neuroepidemiology. 2008;30:138–9.
Article
PubMed
Google Scholar
Vandenbroucke JP, von Elm E, Altman DG, Gotzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration. Epidemiology. 2007;18:805–35.
Article
PubMed
Google Scholar
Vickers AJ, Lilja H. Cutpoints in clinical chemistry: time for fundamental reassessment. Clin Chem. 2009;55:15–7.
Article
CAS
PubMed
Google Scholar
White H. Using least squares to approximate unknown regression functions. Int Econ Rev. 1980a;21:149–70.
Article
Google Scholar
White HA. Heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980b;48:817–38.
Article
Google Scholar
Wikimedia Foundation Inc; 2019. Statistical model. URL https://en.wikipedia.org/wiki/State_of_the_art. Accessed 1 July 2019.
Winter C, Kristiansen G, Kersting S, Roy J, Aust D, Knösel T, Rümmele P, Jahnke B, Hentrich V, Rückert F, Niedergethmann M, Weichert W, Bahra M, Schlitt HJ, Settmacher U, Friess H, Büchler M, Saeger H-D, Schroeder M, Pilarsky C, Grützmann R. Google goes cancer: improving outcome prediction for cancer patients by network-based ranking of marker genes. PLOS Comput Biol. 2012;8:e1002511.
Article
CAS
PubMed
PubMed Central
Google Scholar
Wood S. Thin plate regression splines. J Royal Stat Soc Series B. 2003;65:95–114.
Article
Google Scholar
Wood S. Generalized additive models. New York: Chapman & Hall/CRC; 2006.
Book
Google Scholar
Wood S. Generalized additive models: an introduction with R. Second Edition: CRC Press; 2017.
Book
Google Scholar
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Series B (Methodological). 2005;67:301–20.
Article
Google Scholar
Zou H. The adaptive LASSO and its oracle properties. J Am Stat Assoc. 2006;101:1418–29.
Article
CAS
Google Scholar