Fellows Research Meetings

Maximum Likelihood Estimation for Complex Outcome-Dependent Sample Designs

Robert Clark
University of Wollongong

Sometimes the dependent variable in a regression of interest is known in advance for a population, while covariates are measured only for a sample. The sample may then be selected using the population values of the dependent variable. Regression models from these outcome dependent samples are fitted using maximum likelihood (for specific designs), conditional likelihood, or probability weighted likelihood. For the special case of logistic regression and case-control, unweighted regression turns out to give consistent estimators of the coefficients but not the intercept. In this talk I define the maximum likelihood estimator and an algorithm for general sample design, whereas previous research has assumed specific stratified designs. Along the way, I show that sample likelihood can be regarded as the approximation to maximum likelihood based on Poisson sampling with low probabilities. I also show that the case-control property applies to generalized linear models, not just to logistic regression, provided the selection probabilities are of a particular form. The different estimation methods are compared in a simulation study under a variety of sampling schemes.

Last reviewed: 7 September, 2017