12 House of Commons elections: modeling with the multivariate Student-\(t\) density
The data for this example consist of constituency vote proportions from the 1992 United Kingdom House of Commons election.
These data come from Katz and King (1999), were re-analyzed Tomz, Tucker, and Wittenberg (2002).10
This data is included in the pscl package as UKHouseOfCommons
:
(data("UKHouseOfCommons", package = "pscl"))
#> [1] "UKHouseOfCommons"
glimpse(UKHouseOfCommons)
#> Observations: 521
#> Variables: 12
#> $ constituency <chr> "Barrow & Furness", "Berwick-upon-Tweed", "Bishop...
#> $ county <chr> "Cumbria", "Northumberland", "Durham", "Durham", ...
#> $ y1 <dbl> 1.3286, -0.3032, 0.5598, 0.0978, 1.7351, 0.4546, ...
#> $ y2 <dbl> 1.473, -0.663, 1.011, 0.909, 1.851, 1.925, 0.108,...
#> $ y1lag <dbl> 1.1820, -0.5689, 0.7052, -0.4139, 1.5507, 0.0408,...
#> $ y2lag <dbl> 1.0142, -1.0906, 1.0258, 0.3037, 1.6453, 1.4702, ...
#> $ coninc <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
#> $ labinc <int> 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1...
#> $ libinc <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
#> $ v1 <dbl> 0.413, 0.328, 0.318, 0.240, 0.435, 0.167, 0.533, ...
#> $ v2 <dbl> 0.477, 0.229, 0.500, 0.541, 0.488, 0.727, 0.246, ...
#> $ v3 <dbl> 0.1094, 0.4437, 0.1818, 0.2181, 0.0767, 0.1061, 0...
The data consist of the vote proportions for 522 constituencies, for the three major UK parties: the Labor party, the Conservative Party, and the Liberal-Alliance.
Instead of working with the vote proportions directly, we will work with log-odds ratios.
This is common in the analysis of multinomial or “compositional” data (Aitchison 1982).
The column y1
is the log-odds of Conservative to the Liberal-Democratic vote share, while y2
is the log-odds of Labor to the Liberal-Democratic vote share.
Let \(y_{i,k}\), \(k \in \{1, 2\}\), \(i \in 1, \dots, N\) be the log-odds ratio vote share in constituency \(i\). Katz and King (1999) noted that the distribution of the log-odds ratios appear to be heavy-tailed relative to the normal. Thus, like them, we will model the data with a multivariate Student’s \(t\) distribution with unknown degrees of freedom (\(\nu\)), \[ \begin{aligned}[t] y_i &\sim \mathsf{StudentT}(\nu, \alpha + x' \beta, \Sigma) & i \in 1, \dots, N, \end{aligned} \]
For identification, as in a logit regression, either the intercept or scale must be fixed. In this case, \(\Sigma\) is a correlation matrix.
Weakly informative priors are used for the regression parameters. The degrees of freedom of the multivariate Student t distribution is a parameter, and given a weakly informative Gamma distribution that puts most of the prior density between 3 and 40 (Juárez and Steel 2010), \[ \begin{aligned}[t] \alpha &\sim \mathsf{Normal}(0, 10) , \\ \beta_p &\sim \mathsf{Normal}(0, 2.5), & p \in 1, \dots, P , \\ \Sigma &\sim \mathsf{LkjCorr}(\eta) , \\ \nu &\sim \mathsf{Gamma}(2, 0.1) . \end{aligned} \]
(data("UKHouseOfCommons", package = "pscl"))
#> [1] "UKHouseOfCommons"
glimpse(UKHouseOfCommons)
#> Observations: 521
#> Variables: 12
#> $ constituency <chr> "Barrow & Furness", "Berwick-upon-Tweed", "Bishop...
#> $ county <chr> "Cumbria", "Northumberland", "Durham", "Durham", ...
#> $ y1 <dbl> 1.3286, -0.3032, 0.5598, 0.0978, 1.7351, 0.4546, ...
#> $ y2 <dbl> 1.473, -0.663, 1.011, 0.909, 1.851, 1.925, 0.108,...
#> $ y1lag <dbl> 1.1820, -0.5689, 0.7052, -0.4139, 1.5507, 0.0408,...
#> $ y2lag <dbl> 1.0142, -1.0906, 1.0258, 0.3037, 1.6453, 1.4702, ...
#> $ coninc <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
#> $ labinc <int> 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1...
#> $ libinc <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
#> $ v1 <dbl> 0.413, 0.328, 0.318, 0.240, 0.435, 0.167, 0.533, ...
#> $ v2 <dbl> 0.477, 0.229, 0.500, 0.541, 0.488, 0.727, 0.246, ...
#> $ v3 <dbl> 0.1094, 0.4437, 0.1818, 0.2181, 0.0767, 0.1061, 0...
uk92_data <- within(list(), {
y <- as.matrix(dplyr::select(UKHouseOfCommons, y1, y2))
X <- model.matrix(~ 0 + y1lag + y2lag + coninc + labinc + libinc, data = UKHouseOfCommons) %>% scale()
N <- nrow(y)
K <- ncol(y)
P <- ncol(X)
alpha_loc <- rep(0, K)
alpha_scale <- rep(10, K)
beta_loc <- matrix(0, K, P)
beta_scale <- matrix(2.5, K, P)
Sigma_corr_shape <- 2
Sigma_scale_scale <- 5
})
data {
// multivariate outcome
int N;
int K;
vector[K] y[N];
// covariates
int P;
vector[P] X[N];
// prior
vector[K] alpha_loc;
vector[K] alpha_scale;
vector[P] beta_loc[K];
vector[P] beta_scale[K];
real Sigma_corr_shape;
real Sigma_scale_scale;
}
parameters {
// regression intercept
vector[K] alpha;
// regression coefficients
vector[P] beta[K];
// Cholesky factor of the correlation matrix
cholesky_factor_corr[K] Sigma_corr_L;
vector[K] Sigma_scale;
// student-T degrees of freedom
real nu;
}
transformed parameters {
vector[K] mu[N];
matrix[K, K] Sigma;
// covariance matrix
Sigma = crossprod(diag_pre_multiply(Sigma_scale, Sigma_corr_L));
for (i in 1:N) {
for (k in 1:K) {
mu[i, k] = alpha[k] + dot_product(X[i], beta[k]);
}
}
}
model {
for (k in 1:K) {
alpha[k] ~ normal(alpha_loc[k], alpha_scale[k]);
beta[k] ~ normal(beta_loc[k], beta_scale[k]);
}
nu ~ gamma(2, 0.1);
Sigma_scale ~ cauchy(0., Sigma_scale_scale);
Sigma_corr_L ~ lkj_corr_cholesky(Sigma_corr_shape);
y ~ multi_student_t(nu, mu, Sigma);
}
Fit the model in Stan.
summary(uk92_fit, par = c("nu", "alpha", "beta", "Sigma"))$summary
#> mean se_mean sd 2.5% 25% 50% 75%
#> nu 4.6265 2.29e-02 0.67580 3.5457 4.1313 4.5598 5.04690
#> alpha[1] 0.9295 2.90e-04 0.00853 0.9136 0.9238 0.9292 0.93503
#> alpha[2] 0.6090 3.85e-04 0.01138 0.5868 0.6013 0.6086 0.61684
#> beta[1,1] 0.3612 4.86e-04 0.01340 0.3334 0.3525 0.3608 0.37011
#> beta[1,2] 0.1641 5.64e-04 0.01378 0.1385 0.1542 0.1636 0.17386
#> beta[1,3] 0.0466 5.65e-04 0.01414 0.0172 0.0378 0.0469 0.05629
#> beta[1,4] -0.0620 6.98e-04 0.01624 -0.0952 -0.0735 -0.0615 -0.05111
#> beta[1,5] -0.0135 5.80e-04 0.01480 -0.0433 -0.0229 -0.0128 -0.00383
#> beta[2,1] -0.0209 6.33e-04 0.01760 -0.0552 -0.0326 -0.0210 -0.00972
#> beta[2,2] 1.0396 7.07e-04 0.01873 1.0047 1.0267 1.0390 1.05279
#> beta[2,3] 0.0723 7.71e-04 0.01936 0.0323 0.0597 0.0732 0.08537
#> beta[2,4] -0.0539 9.70e-04 0.02162 -0.0969 -0.0691 -0.0532 -0.03851
#> beta[2,5] -0.0162 8.15e-04 0.02082 -0.0577 -0.0292 -0.0152 -0.00339
#> Sigma[1,1] 0.0320 9.75e-05 0.00287 0.0268 0.0300 0.0318 0.03383
#> Sigma[1,2] 0.0371 1.10e-04 0.00346 0.0311 0.0347 0.0369 0.03954
#> Sigma[2,1] 0.0371 1.10e-04 0.00346 0.0311 0.0347 0.0369 0.03954
#> Sigma[2,2] 0.0564 1.57e-04 0.00497 0.0476 0.0527 0.0563 0.05972
#> 97.5% n_eff Rhat
#> nu 6.0858 868 1.001
#> alpha[1] 0.9465 863 0.999
#> alpha[2] 0.6312 873 0.999
#> beta[1,1] 0.3872 760 1.002
#> beta[1,2] 0.1920 597 1.002
#> beta[1,3] 0.0725 627 1.000
#> beta[1,4] -0.0296 541 1.002
#> beta[1,5] 0.0165 652 1.000
#> beta[2,1] 0.0146 773 1.000
#> beta[2,2] 1.0771 702 1.003
#> beta[2,3] 0.1077 630 0.999
#> beta[2,4] -0.0124 497 1.001
#> beta[2,5] 0.0240 652 1.000
#> Sigma[1,1] 0.0379 865 0.999
#> Sigma[1,2] 0.0441 1000 0.999
#> Sigma[2,1] 0.0441 1000 0.999
#> Sigma[2,2] 0.0667 1000 0.999
12.1 Questions
- Given this model, replicate some of the results in Katz and King (1999).
- Model the data using a multivariate normal model instead. How do the results differ? Which fits the data better? What does the value of \(\nu\) from the multivariate Student t model tell you about the plausibility of the multivariate normal distribution?
- Tomz, Tucker, and Wittenberg (2002) suggest using seemingly unrelated regressions (SUR). Model the data with SUR. How does it compare in results and speed?
- Could you model this using a multinomial model with the data provided? What data would you need?
References
Katz, Jonathan N., and Gary King. 1999. “A Statistical Model for Multiparty Electoral Data.” American Political Science Review 93 (01). Cambridge University Press (CUP): 15–32. https://doi.org/10.2307/2585758.
Tomz, Michael, Joshua A. Tucker, and Jason Wittenberg. 2002. “An Easy and Accurate Regression Model for Multiparty Electoral Data.” Political Analysis 10 (1). [Oxford University Press, Society for Political Methodology]: 66–83. http://www.jstor.org/stable/25791665.
Aitchison, J. 1982. “The Statistical Analysis of Compositional Data.” Journal of the Royal Statistical Society. Series B (Methodological) 44 (2). [Royal Statistical Society, Wiley]: 139–77. http://www.jstor.org/stable/2345821.
Juárez, Miguel A., and Mark F. J. Steel. 2010. “Model-Based Clustering of Non-Gaussian Panel Data Based on Skew-t Distributions.” Journal of Business & Economic Statistics 28 (1). Informa UK Limited: 52–66. https://doi.org/10.1198/jbes.2009.07145.