International Conference on Robust Statistics 2017

Workshop on Robust Inference for Sample Surveys

After the publication of Neyman's influential 1934 paper on design-based survey inference , twentieth century sampling theory sought to achieve robustness via nonparametric methods of estimation and inference. Much of the early development of survey statistical methods came from official statistics, where large samples were the norm and strong modelling assumptions were avoided. However, the advent of Big Data in the twenty-first century, smaller survey budgets and increasing non-response have resulted in a push towards more data integration and small sample inference. This has generated great interest in model-based methods and highlighted the importance of robust inference in model-based survey sampling.

The National Institute for Applied Statistics Research Australia (NIASRA) and the University of Wollongong will host the 2017 International Conference on Robust Statistics (ICORS) from 3-7 July. The aim of the ICORS conference is to bring together researchers and practitioners interested in robust statistics, data analysis and related areas. Given the interest in robust methods for survey design, estimation and inference, a Workshop on Robust Inference for Sample Surveys will take place on Friday 7 July as part of ICORS 2017.

Professor Ray Chambers of NIASRA will present the opening address for the Workshop. This will also be an ICORS 2017 keynote address, reviewing developments in outlier-robust survey estimation since the early 1980s, as well as current research in outlier-robust small area estimation. At this stage it is planned that the workshop will consist of thematic sessions of invited and contributed papers, covering some or all of

  1. The use of robust methods in official statistics in Australia and New Zealand;
  2. Sample design for robust statistical methods;
  3. New developments in robust small area estimation;
  4. Robust non-parametric sample survey inference;
  5. Robust inference for combined and linked data;
  6. Robust modelling of complex survey data in the social and environmental sciences. 

Attendance at the Workshop is included for full ICORS registrants. Registration for the Workshop only is available through the ICORS 2017 website: niasra.uow.edu.au/icors2017. The fee for this one-day registration is AU$100, and covers morning and afternoon refreshments. Lunch is not included.

Workshop Schedule

Time Presenter Title
09:00 - 09:45 Ray Chambers Robust Models For Small Area Estimation - Random Group Effects vs. Random Group Indexing
09:45 - 10:00   Break
10:00 - 10:30 Laura Dumitrescu Bias-Correction Under A Semi-Parametric Model For Small Area Estimation
10:30 - 11:00 Jarod Lee Exploring The Robustness Of Log-Gamma vs. Normal For Random Effect Distributions: The Example Of Small Area Estimation
11:00 - 11:30 Janice Scealy Small Area Estimation Of Expenditure Proportions
11:30 - 12:00 John Preston
(James Chipperfield)
Imputation Using Robust Regression
12:00 - 12:30 Alice Richardson Robust Population Health
12:30 - 13:30   Lunch
13:30 - 14:00 Suojin Wang Robustifying Inference For Probabilistically Linked Data With Population Auxiliary Information
14:00 - 14:30 James Chipperfield Estimating Population Totals Using Imperfect Administrative Data And A Survey Subject To Non-Ignorable Non-Response
14:30 - 15:00 Oksana Honchar Exploring The Use Of Time Series Modelling In State Space Form For Detection Of Outliers And Structural Changes
15:00 - 15:15   Break
15:15 - 15:45 Alan Welsh Robust Model-Based Sampling Designs
15:45 - 16:15 Sanjay Chaudhuri An Empirical Likelihood Based Estimator For Respondent Driven Sampled Data
16:15 - 16:45 Andrew Ward Improving Robustness Of Estimates From Non-Probability Online Samples
16:45 - 17:15 Robert Clark One-Sided Winsorization In Sample Surveys
17:15 - 17:45 Phil Kokic Setting Tuning Parameters In One- And Two-Sided Winsorization In Sample Surveys

Authors, Titles & Abstracts for Presentations (Presenter in italics)

Author(s): Jarod Lee and James Brown
Affiliation: University of Technology Sydney
Title: Exploring The Robustness Of Log-Gamma vs. Normal For Random Effect Distributions: The Example Of Small Area Estimation
Abstract: When making small area estimates, it is common to utilise a GLM model with Gaussian random effects. In this work, we contrast this approach with a Poisson model combined with log-gamma random effects. The latter has advantages with respect to the form of the likelihood function and we also explore whether it has advantages regarding robustness to outliers and skewness with respect to the random effect distribution.

Author(s): Sanjay Chaudhuri and Mark Handcock
Affiliation: National University of Singapore
Title: An Empirical Likelihood Based Estimator For Respondent Driven Sampled Data
Abstract: We discuss an empirical likelihood based estimator of population means applicable to data obtained from a respondent driven sampling procedure. Our estimator directly uses the second order weights of selection and constructs a composite empirical likelihood to estimate the parameter of interest. This estimate is asymptotically unbiased and normally distributed. Analytic expression of the asymptotic standard errors can be obtained which can also be estimated from the data using a sandwich estimator. Using real life social network data, we show that, our estimator produces confidence intervals with far better coverages than the existing estimators.

Author(s): James Chipperfield
Affiliation: Australian Bureau of Statistics
Title: Estimating Population Totals Using Imperfect Administrative Data And A Survey Subject To Non-Ignorable Non-Response
Abstract: Methods for estimating population totals often assume that survey non-response can be ignored and that imperfections in register data (e.g. contain incorrect information or miss /double-count population units), used as benchmarks for survey weights, can be ignored. This paper explores a way to relax both of these assumptions at the same time. One application is estimating the Australian Resident Population using the Post Enumeration Survey and the Australian Census of Population and Housing.

Author(s): Robert Clark and Phil Kokic
Affiliation: University of Wollongong
Title: One-Sided Winsorization In Sample Surveys
Abstract: Sample surveys have the distinctive feature of “representative outliers”. These are extreme sample values which should not be downweighted too far when estimating population means or totals, because they are also influential in the population. Some theory on one-sided winsorization cutoffs is presented. A simulation study based on business survey data finds that one-sided winsorization with estimated optimal cutoffs performs well compared to other alternatives. We argue that the concept of representative outliers deserves greater attention outside sample surveys, because the distribution including outliers and non-outliers is sometimes the correct target of inference, albeit a challenging one.

Author(s): Phil Kokic and Robert Clark
Affiliation: Australian National University
Title: Setting Tuning Parameters In One- And Two-Sided Winsorization In Sample Surveys
Abstract: The choice of tuning parameters is particularly important in outlier treatment in sample surveys, because the aim is to predict the non-sample total of both outliers and non-outliers. A simplifying theoretical result is available for setting tuning parameters in one-sided winsorization, but we show that no such result is possible in two-sided winsorization. A number of alternatives for the two-sided case are explored. A simulation study evaluates different methods of setting tuning parameters for both one- and two-sided winsorization for both levels and movements.

Author(s): James Dawber and Ray Chambers
Affiliation: University of Wollongong
Title: Robust Models For Small Area Estimation - Random Group Effects vs. Random Group Indexing
Abstract: The standard approach to small area estimation based on unit level data is to assume that between area heterogeneity is a consequence of the values of random area effects. There is a very well-developed body of theory that addresses estimation of a regression function in this case and its use in prediction of small area characteristics of interest. However, there is another way of characterising between area heterogeneity that does not depend on recourse to a latent variable to distinguish differences between areas. Instead, a suitable ensemble regression function that covers the full spectrum of variability for the characteristic of interest is first used to index the population. Area heterogeneity is present if these index values cluster within areas, and small area estimation is based on that particular regression function within the ensemble that corresponds to an area-specific 'average' index. There is no random effect, with its consequent distributional assumptions, to complicate matters. In this context robust M-quantile ensemble models have seen considerable development in recent years, with a population unit's index defined by the index of that component M-quantile regression function with value equal to the unit's value for the characteristic of interest. In this presentation we will briefly discuss these two paradigms and then describe extensions of M-quantile ensemble models to binary and categorical responses. The extension of random indexing of population values based on the categorical M-quantile model will also be described and applied to small area estimation.

Author(s): Laura Dumitrescu
Affiliation: Victoria University Wellington
Title: Bias-Correction Under A Semi-Parametric Model For Small Area Estimation
Abstract: In recent years, several robust estimation techniques for estimating a unit-level model have been developed in the context of small area estimation. We consider a semi-parametric framework and use a bias correction technique to obtain efficient robust predictors of the area means. The proposed predictor can be used when outliers occur in the random effects and/or possible outliers in the individual residuals, but can also in the case when data is derived from a mixture for which the mean of the outliers and the mean of the non-outliers are different.

Author(s): Oksana Honchar
Affiliation: Australian Bureau of Statistics
Title: Exploring The Use Of Time Series Modelling In State Space Form For Detection Of Outliers And Structural Changes
Abstract: In this paper we first explain a strategy for removing the sample error component from the structural time series model. We then extend the model to incorporate multiple series to improve estimation and prediction of specific months in the series. We then demonstrate how this approach can detect the presence of an unusual movements in estimate in the series, and that detection is more effective with this approach.

Author(s): John Preston (presentation by James Chipperfield)
Affiliation: Australian Bureau of Statistics
Title: Imputation Using Robust Regression
Abstract: Many national statistical offices embody the ideal of an “industrialisation” of the production of official statistics, and hence need to implement standardised and automated imputation methods into their generalised systems. Continuous variables in business surveys are usually imputed using deterministic regression imputation methods, such as mean imputation, historical ratio imputation and auxiliary ratio imputation. This deterministic regression imputation method is predicated by the well-known ordinary least squares linear regression model, which can be severely affected by outliers, leading to estimates of regression parameters that do not accurately reflect the true underlying relationship between the variable of interest and the auxiliary variables. While these generalised systems will generally allow the functionality to remove outliers from the calculation of the imputed values, these influential units are often ignored in practice because the manual identification of outliers can be an inconsistent and inefficient process. Robust deterministic regression imputation methods which can be implemented as a standardised and automated process have been proposed. A simulation study using continuous variables from a typical business survey found that these robust imputation methods performs well compared with other alternatives.

Author(s): Alice Richardson
Affiliation: National Centre for Epidemiology and Population Health
Title: Robust Population Health
Abstract: Robust methods in population health research are very popular, but the types of robustness catered for form only a small subset of possible departures from a model. In this talk I will discuss what robustness means in a population health research context, describe the models considered, and compare the methods implemented. Finally I will offer some thoughts on how to embed a wider range of robust modelling into biostatistics as it is applied in population health research.

Author(s): Janice Scealy
Affiliation: Australian National University
Title: Small Area Estimation Of Expenditure Proportions
Abstract: Compositional data are vectors of proportions defined on the unit simplex and this type of constrained data occur frequently in Government surveys. It is also possible for the compositional data to be correlated due to the clustering or grouping of the observations within small domains or areas. We propose a new class of mixed model for compositional data based on the Kent distribution for directional data, where the random effects also have Kent distributions. One useful property of the new directional mixed model is that the marginal mean direction has a closed form and is interpretable. The random effects enter the model in a multiplicative way via the product of a set of rotation matrices and the conditional mean direction is a random rotation of the marginal mean direction. In small area estimation settings the mean proportions are usually of primary interest and these are shown to be simple functions of the marginal mean direction. For estimation we apply a quasi-likelihood method which results in solving a new set of generalised estimating equations and these are shown to have low bias in typical situations. For inference we use a nonparametric bootstrap method for clustered data which does not rely on estimates of the shape parameters (shape parameters are difficult to estimate in Kent models). We analyse data from the 2009-10 Australian Household Expenditure Survey CURF (confidentialised unit record file). We predict the proportions of total weekly expenditure on food and housing costs for households in a chosen set of domains. The new approach is shown to be more tractable than the traditional approach based on the logratio transformation.

Author(s): Suojin Wang, Nicola Salvati, Enrico Fabrizi and Ray Chambers
Affiliation: Texas A&M University
Title: Robustifying Inference For Probabilistically Linked Data With Population Auxiliary Information
Abstract: Linkage errors occur when probability-based methods are used to link or match records from two or more distinct data sets corresponding to the same target population. These errors can lead to biased analytical decisions when they are ignored. We investigate an estimating equations approach to develop a bias correction method for secondary analysis of probabilistically linked data, using the missing information principle to accommodate the more realistic scenario of dependent linkage errors in both linear and logistic regression settings. We also develop the maximum likelihood solution when population auxiliary information in the form of population summary statistics is available. We examine how incorporation of population auxiliary information can robustify inference for linked sample data by removing measurement or linkage error bias. Our simulation results show that an incorrect assumption of independent linkage errors can lead to insufficient linkage error bias correction, while an approach that allows for correlated linkage errors appears to fully correct this bias.

Author(s): Dina Neiger, Andrew C. Ward and Darren W. Pennay
Affiliation: The Social Research Centre, Melbourne
Title: Improving robustness of estimates from non-probability online samples
Abstract: Weighting is a common method to reduce the total survey error for probability samples by adjusting for different chances of selection and by enforcing the population distribution across key demographic characteristics. There is no agreement on the efficacy of similar weighting adjustments for correcting bias of non-probability samples, however, given non-probability selection methods, the enforcement of quotas and the proprietary mechanisms used by sample providers to ensure that their sample resembles the population. Alternative methods, such as blending and calibration (e.g. DiSogra et al., 2011) and propensity-based weighting (e.g. Schonlau et al., 2003) have shown benefit but there is limited research available comparing the impact of different methods on the total survey error. The recent establishment of Australia’s first probability online panel, Life in Australia, presents the opportunity to assess a range of weighting adjustments to reduce the bias of survey estimates from non-probability samples. With non-probability panels still representing the most common online collection methodology in Australia, the Life in Australia panel, in conjunction with data from the Australian Online Panels Benchmarking Study (Pennay et al., 2016), enables the evaluation of a number of different approaches to incorporate and improve the results of non-probability panels. By comparing estimates of key outcome variables with independent benchmarks, we are able to develop some general guidance for reducing bias and improving the robustness of survey estimates from non-probability online surveys.

Author(s): Alan Welsh
Affiliation: Australian National University
Title: Robust Model-Based Sampling Designs
Abstract: A general issue in statistics, including survey sampling, is that the optimal design under a model often represents over-commitment to the model in the sense that using it can produce very good estimates when the model holds and very poor estimates when it does not. Moreover, optimal designs may not allow the possibility of either checking the model or fitting more general models. One way to approach at least the first problem is to consider robust designs which produce estimates that perform well when the designer’s assumed model holds and also remain reasonably accurate in a neighbourhood of this central model. We will discuss this approach to sample design in the context of predicting the finite population total of a survey variable that is related to an auxiliary variable that is available for all units in the population. The design problem is to specify a selection rule to select the units for the sample, using only the values of the auxiliary variable, so that the predictor has optimal robustness properties. We will discuss the general issues in approaching this problem and describe the optimally robust (‘minimax’) design approach of Welsh and Wiens (2012, Stat. Comput.).
 

Last reviewed: 29 June, 2017