Can we use the approaches of ecological inference to learn about the potential for dependence bias in dualsystem estimation? An application to cancer registration data
James Brown (University of Technology Sydney)
The dual-system estimator, or estimators with a similar underlying set of assumptions and structure, is a widely used approach to estimate the unknown size of a population. Within official statistics its use is linked with population census, while in health applications it is often used to estimate true levels of incidence from imperfect reporting systems; the classic example being work by Sekar and Deming exploring the estimation of births in India in the 1940s. Critical to the implementation of dual-system estimation are the assumptions that the probability of being counted in a source is homogeneous and that the event of being counted in each source is independent. When either of these assumptions fails, the two by two table will have an odds ratio different to one and the dual-system estimator will be biased.
Inferential frameworks such as the aggregate association index (AAI) have been developed to allow the researcher to assess the plausibility of independence between two variables in a two by two table, when only the margins are observed. Given any appropriate measure of relationship, this strategy relies on determining the AAI, which provides an indication of the likely association structure between the variables given only the marginal information. Further advances of the AAI have also been established including its link with the odds ratio and its relationship with the size of the study being undertaken. Determining the population size from a two by two table given limited information is an alternative variation of the framework on which the AAI is built. Therefore the underlying theoretical properties of the two by two table are identical in both scenarios - it is only the nature of the unknown information that differs.
In this paper we make the first steps to exploring the use of an AAI type framework (and its relatives) to assess the plausibility of an independence assumption in applications of population size estimation. We use alternative data setups based on real data relating to historical cancer registration (with three sources of registration) to demonstrate that the chi-square statistic behaves differently over a range of values for the missing data for differing true relationships between the two variables. We then apply the approach to the cancer registration from two of the registration systems to show that we can see evidence of potential dependence from the observed but incomplete data.
The first results in this paper demonstrate the possibility of exploring the independence assumption when estimating the unknown population size from two lists. As with the AAI framework, the aim is not to directly estimate the level of the association but rather alert the analyst to the potential for an association and its direction allowing them to assess the likelihood of a biased estimate for the population size. This has important implications within a health setting where it is potentially useful to understand if the true population size, of say cancer patients, is likely to be higher or lower than the estimate constructed assuming independence. Within the official statistics setting, it can alert us to situations where it is advantageous to explore whether external data exist that would allow an adjustment for dependence in our two lists.