Statistical methods for analyzing marginal causal effects in matched and unmatched case-control study designs
exposure on a rare disease. The aim of this project is to develop new statistical
methods for estimating causal effects in matched and unmatched case-control
designs. To estimate the causal effect in a case-control study we have to i) adjust
for confounding variables affecting both the exposure and the disease under study
and ii) take into account the biased sampling design, i.e., the fact that the number
of cases are disproportionately large compared to the number of controls. If the
controls have been matched to cases on the basis of background variables, e.g.,
age and sex then the matching has to be taken into account in the subsequent
analysis.
Standard methods focus on estimation of the conditional odds ratio, a measure of
the causal effect for an individual with certain characteristics. In this project we
wish to propose new methods for the estimation of the marginal causal odds
ratio, i.e., the effect of the exposure in the population. The marginal causal odds
ratio is parameter important for decision makers although overlooked in the
statistical literature. We want to propose theory for the selection of sufficient
covariate sets to condition on in order to estimate the causal effect as well as new methods to control for
confounding when estimating long term effects of the disease on future outcome
such as income and unemployment.
2011-2016
The aim of the proposal was to contribute to the development of statistical methods that can be used for causal inference in matched and unmatched case-control designs. The projects especially focus on developing new methods to analyze the effect of social and economical factors for experiencing a rare event, e.g. a disease.
The proposal concerns research within the two following areas:
1) Estimating casual parameters in case-control studies - marginal and conditional parameters
2) Casual inference in secondary analysis of case-control data - Estimation of the effect of an event, e.g. a disease, on outcomes later in life. This implies switching the role of the disease from a response variable of interest to a treatment variable affecting new response variables, e.g., economic and social outcomes.
The project has been performed in accordance to its original purpose and the work carried out has focused on the two areas described in the project plan.
The project manager, Ingeborg Waernbaum, is since 2009 a member of the steering committee for the Swedish Childhood Diabetes Register (SCDR). The statistical methods are studied in the project has been applied to data from SCDR which is a database with a matched case-control design.
The three most important results
An important result is the proposal of an estimator for a marginal causal odds ratio that takes into account the cases and controls were matched on some variables (Persson, E., and Waernbaum I., 2013, Statistics in Medicine). We summarize previous findings and propose a new estimator, which we compare with the previous ones. We use the new estimator to data from the Swedish Childhood Diabetes Registry and analyze the effect of low birth weight on the risk of developing type 1 diabetes. The analysis emphasizes the importance of taking into account the matched sampling design. In our analysis, we found that low birth weight did not significantly contribute to increasing the risk of developing type 1 diabetes.
We have a proposal of a new weighted matching estimator of the causal effect when the study design is a secondary analysis of case-control data (Persson, E., Waernbaum, I. and Lind, T. 2016 revised for Statistics in Medicine). We also present a general bias resulting from ignoring that the primary study was a case-control design. We describe the size of the bias in relation to a number of measures that researchers can estimate in their studies. This means that our results provide applied researchers an opportunity to approximate the size of the error in earlier studies published when the initial sampling design was not taken into account. We apply the proposed estimator in an analysis of the effect of type 1 diabetes on risk for depression. We have also done a study that shows a slight increase in risk for depression for type 1 diabetes than those previously measured in studies on fewer individuals and where they had access to fewer background variables. The result is unique since the Swedish Childhood Diabetes Registry and drug registry provides a new opportunity to study the research question in a more comprehensive way than in earlier studies. The manuscript with the results is under revision for Statistics in Medicine, and we hope that it will be published there.
We have studied how to select variables for estimating causal effects without parametric model dependence on models for the treatment or outcome (Persson, Häggström, Waernbaum and Luna, 2016, revised Computational Statistics and Data Analysis). We describe the methods that lead to that we can choose a variety of variables that are sufficient to check for when estimating a causal effect. We perform computer simulations, where we show how the methods work in practice. We have also developed a statistics program CovSel available to download freely available (https://cran.r-project.org/) belonging to the statistical software R. We have also published an article that describes the software (Häggström, Persson, Waernbaum and Luna, 2015, Journal of Statistical Software).
New research questions generated by the project
The project has generated several new question especially concerning secondary analysis of case-control. We have seen that our results of secondary analysis of case-control data are more general than we thought from the beginning, and they can be used in all cases where the sample is stratified. The project has generated new interesting questions about the differences between the conditional and marginal parameters and which can be estimated in different data situations, i.e. for various sampling designs. Especially interesting are the results that could be developed for the selection of the study population in registry studies. Here we believe that the new research questions generated by the project will be of interest to a wide range of register researchers in social science and medicine.
International connections
Our research is part of an important research area in statistics and an active international research field. Waernbaum has a network of contacts with researchers in the field of causal inference and meet regularly other international researchers within the field. In 2013 Waernbaum was elected to an expert group for Causal Inference of the International Society for Clinical Biostatistics, ISCB along with four other statisticians, Professor Els Goetghebeur (Chairman), Ghent University, Belgium, Associate Professor Erica
Moodie, McGill University, Canada, Professor Bianca de Stavola, London School of Hygiene and
Tropical Medicine, UK and Associate Professor Saskia le Cessie, Leiden Univeristy, the Netherlands. The project results have on several occasions been presented in international statistical conferences in e.g., Japan, UK, Canada and Iceland. Waernbaum participates in yearly meetings with the United Kingdom Causal Inference Network (http://ukcim2016.lshtm.ac.uk/) and she has been an invited speaker for several conferences and seminars.
The two most important publications
Persson, E. and Waernbaum, I. (2013). Estimating a marginal causal odds ratio in a case-control design: analyzing the effect of low birth weight on the risk of type 1 diabetes mellitus. Statistics in Medicine, 32:2500-2512.
Persson, E., Waernbaum, I. and Lind, T. (2016). Estimating marginal causal effects in a secondary analysis of case-control data. Manuscript under revision in Statistics in Medicine.
What is the publication strategy of the project?
We have aimed at publishing our results in the best journals in our field. To make the results available, we have purchased open access extensions for the papers that we have published in journals that were not open access (Statistics in Medicine). The manuscript "Data-driven algorithms for dimension reduction in causal inference: Analyzing the effect of school achievements on acute complications of type 1 diabetes mellitus", is published on the arXiv mathematics, an open access archive for preprints in mathematics and statistics. The statistical software that we have developed are freely available to download from https://cran.r-project.org/.
Publications
Under review
Persson, E., Waernbaum, I. and Lind, T. (2016). Estimating marginal causal effects in a secondary analysis of case-control data. Manuscript under revision in Statistics in Medicine.
Persson, E., Häggström, J., Waernbaum, I., and de Luna, X. Data-driven Algorithms for Dimension Reduction in Causal Inference. Manuscript under revision in Computational Statistics and Data Analysis.
Published Papers
Waernbaum, I. and Dahlquist G. (2015). Low mean temperature rather than few sunshine hours are associated with an increased incidence of type 1 diabetes in children. European Journal of Epidemiology, 31, 61-65.
Häggström, J., Persson, E., Waernbaum, I., and de Luna, X. (2015) CovSel: An R Package for covariate selection when estimating average causal effects. Journal of Statistical Software, 68(1).
Persson, E. and Waernbaum, I. (2013). Estimating a marginal causal odds ratio in a case-control design: analyzing the effect of low birth weight on the risk of type 1 diabetes mellitus. Statistics in Medicine, 32:2500-2512.
Lind, T., Waernbaum, I. Berhan, Y. and Dahlquist, G. (2012). Socioeconomic factors rather than diabetes mellitus per se contribute to an excess use of antidepressants among young adults with childhood onset type-1 diabetes mellitus – a register-based study. Diabetologia, 55:617-624.