Dietrich von Rosen

Multivariate Reduced Rank Regression Modeling Applied to Small Area Estimation

In order to perform tests and to evaluate a model parameters have to be estimated. The estimates are obtained by projecting the response on the space generated by the independent observations. In many cases this space is of no interest. The only issue which matters is the dimension, which is the same as the rank of the regression matrix. Also now, with relative vague model assumptions, it is possible to construct relevant predictors.

In this project we are going to apply the rank restriction approach to bilinear models, among others to the growth curve model of Potthoff & Roy (1964). In particular we want to discuss models which includs background information. However, the amount of background information may be significant and it can be difficult to postulate a precise regression model which incorporates the information. Therefore reduced rank regression will be applied. All models which will be discussed are new.

SAE (small area estimation) has become popular to utilize. Surveys often collect data from large regions. Unfortunately, data are often insufficient to use when studying small regions. On the other side, if data are combined with other types of information (for example, register based information), and relevant statistical models, data from the survey are also possible to use when considering small areas.

In the project the plan is to develop multivariate reduced rank regression analysis with the purpose to supply the SAE field with powerful tools.
Final report

Purpose:
(i) To extend the analysis of multivariate linear models to comprise extended growth curve models (sum of profiles models) with rank restrictions on the mean and dispersion parameters.
(ii) To apply reduced rank models in survey studies with emphasizes on small area (domain) estimation problems.

The research aims included the finding of relevant estimators (if possible explicit maximum likelihood estimators) which in most cases should be restricted maximum likelihood estimators, and construction of appropriate tests, mostly connected to testing stability over time in small areas.

Implementation:

The project was working relatively smoothly. Due to other duties at Stockholm University Tatjana von Rosen's official starting point was postponed to the fall semester 2015, and the planned activity of 70% was reduced. Dietrich von Rosen's research activities followed the original plan. Within the project time nine articles have been written and we have communicated our results at conferences and workshops. Still there are some unfinished manuscripts which will be processed during the forthcoming year. Over the years Tatjana has helped in preparing manuscripts and giving talks at conferences whereas Dietrich has conducted research, organized sessions at international conferences, giving invited talks at conferences and workshops, and supervised research students.

The three most important results:

(i) We have shown how to combine latent processes affecting the mean with multivariate linear (bilinear) mixed effects model.

To have one latent effect which directly influence a response variable has a long history and it is often referred to as reduced rank regression. This part of the work presents a likelihood based approach which ends up in explicit estimators. In our model the latent variables act as covariate variables which we know exists but their impact is vague and will therefore not be considered in detail. One example is if we observe hundreds of weather variables but we cannot say which or how these variables affect, for example, plant growth. Moreover, often due to the design used in a study random effects are included in the model reflecting that certain units are drawn from a larger population. However, if our modelling ideas are applied to "small areas" the use of survey estimates will naturally also lead to random effects models

(ii) We have shown how to obtain explicit estimators in unbalanced mixed linear models.

A general unbalanced mixed linear model with two variance components was considered in detail. Through resampling it was demonstrated how the fixed effects could be estimated explicitly. It was shown that the obtained nonlinear estimator is unbiased and its variance was also derived. A condition was given when the proposed estimator was recommended instead of the ordinary least squares estimator.

(iii) We have considered small area models where there are latent variables and mixed linear effects are included with the ambition to handle survey data and covariates from "small areas".

A random effects growth curve model with a large number of covariates was formulated. The main idea was that the covariates are governed by a few latent processes. Prediction of random effects and predicted small area means were derived. The proposed techniques can be useful for small area estimation when longitudinal surveys are performed. We have also discussed when missing response values exist, in particular drop outs.

New research questions which have been generated by the project:

In order to use the theoretical results together with real data, model validation techniques should be developed. In particular residuals when latent mean processes exist should be defined and studied. Usually residuals are defined via projections (appears when data is compared with the estimated models) but this cannot take place in models with rank restrictions on the mean so one has to understand what the difference between the observation and estimated model means. Moreover residuals should be defined and studied when both latent processes and random effects exist. After residuals have been obtained "influential observations" and "outliers" can be studied. A challenging problem is to estimate the dimension of the latent process.

International perspective (project contribution) of the project:

(i) Research students

A student from Rwanda, financed via SIDA, accomplished his thesis:
Innocent Ngaruye (2017) Contributions to Small Area Estimation: Using Random Effects Growth Curve Model, Linköping University;

Ongoing supervision of Felix Wemano from Makerere University, Uganda, which also is financed via SIDA. The thesis is about residuals in models used for small are estimation which were developed by Ngaruye. By June 2020 two papers will be ready for submission.

(ii) International coworkers

Feng Li, Central University of Finance and Economics, Beijing (School of Statistics and Mathematics);
Julia Volaufova, Louisiana State University (School of Medicine);
Innocent Ngaruye, University of Rwanda (College of Sciences and Technology);
Joseph Nzabanita, University of Rwanda (School of Agriculture and Food Sciences); Chengcheng Hao, Shanghai University of International Business and Economics (School of Statistics and Information).

(iii) Organized invited sessions at international conferences

Session organizer "Mixed linear models with applications to small area estimation" at
CFE-CMStatistics, London, 16-18 December, 2017.

Session organizer "Mixed linear models analysis: new estimation methods and diagnostic tools" at CFE-CMStatistics, Pisa, 14-16 December, 2018.

Invited professor Timo Schmid to organize a session "Small Area Estimation" at LINSTAT 2018, Bedlewo Poland, 20-24 August, 2018.

Communicating results to society and other researchers:

We have focused to communicate our results via international conferences and different seminars.

Grant administrator
Swedish University of Agricultural Sciences
Reference number
P14-0641:1
Amount
SEK 3,432,000
Funding
RJ Projects
Subject
Probability Theory and Statistics
Year
2014