12 House of Commons elections: modeling with the multivariate Student-\(t\) density

library("tidyverse")
library("rstan")

The data for this example consist of constituency vote proportions from the 1992 United Kingdom House of Commons election. These data come from Katz and King (1999), were re-analyzed Tomz, Tucker, and Wittenberg (2002).¹⁰ This data is included in the pscl package as UKHouseOfCommons:

(data("UKHouseOfCommons", package = "pscl"))
#> [1] "UKHouseOfCommons"
glimpse(UKHouseOfCommons)
#> Observations: 521
#> Variables: 12
#> $ constituency <chr> "Barrow & Furness", "Berwick-upon-Tweed", "Bishop...
#> $ county       <chr> "Cumbria", "Northumberland", "Durham", "Durham", ...
#> $ y1           <dbl> 1.3286, -0.3032, 0.5598, 0.0978, 1.7351, 0.4546, ...
#> $ y2           <dbl> 1.473, -0.663, 1.011, 0.909, 1.851, 1.925, 0.108,...
#> $ y1lag        <dbl> 1.1820, -0.5689, 0.7052, -0.4139, 1.5507, 0.0408,...
#> $ y2lag        <dbl> 1.0142, -1.0906, 1.0258, 0.3037, 1.6453, 1.4702, ...
#> $ coninc       <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
#> $ labinc       <int> 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1...
#> $ libinc       <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
#> $ v1           <dbl> 0.413, 0.328, 0.318, 0.240, 0.435, 0.167, 0.533, ...
#> $ v2           <dbl> 0.477, 0.229, 0.500, 0.541, 0.488, 0.727, 0.246, ...
#> $ v3           <dbl> 0.1094, 0.4437, 0.1818, 0.2181, 0.0767, 0.1061, 0...

The data consist of the vote proportions for 522 constituencies, for the three major UK parties: the Labor party, the Conservative Party, and the Liberal-Alliance. Instead of working with the vote proportions directly, we will work with log-odds ratios. This is common in the analysis of multinomial or “compositional” data (Aitchison 1982). The column y1 is the log-odds of Conservative to the Liberal-Democratic vote share, while y2 is the log-odds of Labor to the Liberal-Democratic vote share.

Let \(y_{i,k}\), \(k \in \{1, 2\}\), \(i \in 1, \dots, N\) be the log-odds ratio vote share in constituency \(i\). Katz and King (1999) noted that the distribution of the log-odds ratios appear to be heavy-tailed relative to the normal. Thus, like them, we will model the data with a multivariate Student’s \(t\) distribution with unknown degrees of freedom (\(\nu\)), \[ \begin{aligned}[t] y_i &\sim \mathsf{StudentT}(\nu, \alpha + x' \beta, \Sigma) & i \in 1, \dots, N, \end{aligned} \]

For identification, as in a logit regression, either the intercept or scale must be fixed. In this case, \(\Sigma\) is a correlation matrix.

Weakly informative priors are used for the regression parameters. The degrees of freedom of the multivariate Student t distribution is a parameter, and given a weakly informative Gamma distribution that puts most of the prior density between 3 and 40 (Juárez and Steel 2010), \[ \begin{aligned}[t] \alpha &\sim \mathsf{Normal}(0, 10) , \\ \beta_p &\sim \mathsf{Normal}(0, 2.5), & p \in 1, \dots, P , \\ \Sigma &\sim \mathsf{LkjCorr}(\eta) , \\ \nu &\sim \mathsf{Gamma}(2, 0.1) . \end{aligned} \]

(data("UKHouseOfCommons", package = "pscl"))
#> [1] "UKHouseOfCommons"
glimpse(UKHouseOfCommons)
#> Observations: 521
#> Variables: 12
#> $ constituency <chr> "Barrow & Furness", "Berwick-upon-Tweed", "Bishop...
#> $ county       <chr> "Cumbria", "Northumberland", "Durham", "Durham", ...
#> $ y1           <dbl> 1.3286, -0.3032, 0.5598, 0.0978, 1.7351, 0.4546, ...
#> $ y2           <dbl> 1.473, -0.663, 1.011, 0.909, 1.851, 1.925, 0.108,...
#> $ y1lag        <dbl> 1.1820, -0.5689, 0.7052, -0.4139, 1.5507, 0.0408,...
#> $ y2lag        <dbl> 1.0142, -1.0906, 1.0258, 0.3037, 1.6453, 1.4702, ...
#> $ coninc       <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
#> $ labinc       <int> 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1...
#> $ libinc       <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
#> $ v1           <dbl> 0.413, 0.328, 0.318, 0.240, 0.435, 0.167, 0.533, ...
#> $ v2           <dbl> 0.477, 0.229, 0.500, 0.541, 0.488, 0.727, 0.246, ...
#> $ v3           <dbl> 0.1094, 0.4437, 0.1818, 0.2181, 0.0767, 0.1061, 0...

uk92_data <- within(list(), {
  y <- as.matrix(dplyr::select(UKHouseOfCommons, y1, y2))
  X <- model.matrix(~ 0 + y1lag + y2lag + coninc + labinc + libinc, data = UKHouseOfCommons) %>% scale()
  N <- nrow(y)
  K <- ncol(y)
  P <- ncol(X)
  alpha_loc <- rep(0, K)
  alpha_scale <- rep(10, K)
  beta_loc <- matrix(0, K, P)
  beta_scale <- matrix(2.5, K, P)
  Sigma_corr_shape <- 2
  Sigma_scale_scale <- 5
})

uk92_mod <- stan_model("stan/uk92.stan")

  data {
  // multivariate outcome
  int N;
  int K;
  vector[K] y[N];
  // covariates
  int P;
  vector[P] X[N];
  // prior
  vector[K] alpha_loc;
  vector[K] alpha_scale;
  vector[P] beta_loc[K];
  vector[P] beta_scale[K];
  real Sigma_corr_shape;
  real Sigma_scale_scale;
}
parameters {
  // regression intercept
  vector[K] alpha;
  // regression coefficients
  vector[P] beta[K];
  // Cholesky factor of the correlation matrix
  cholesky_factor_corr[K] Sigma_corr_L;
  vector[K] Sigma_scale;
  // student-T degrees of freedom
  real nu;
}
transformed parameters {
  vector[K] mu[N];
  matrix[K, K] Sigma;
  // covariance matrix
  Sigma = crossprod(diag_pre_multiply(Sigma_scale, Sigma_corr_L));
  for (i in 1:N) {
    for (k in 1:K) {
      mu[i, k] = alpha[k] + dot_product(X[i], beta[k]);
    }
  }
}
model {
  for (k in 1:K) {
    alpha[k] ~ normal(alpha_loc[k], alpha_scale[k]);
    beta[k] ~ normal(beta_loc[k], beta_scale[k]);
  }
  nu ~ gamma(2, 0.1);
  Sigma_scale ~ cauchy(0., Sigma_scale_scale);
  Sigma_corr_L ~ lkj_corr_cholesky(Sigma_corr_shape);
  y ~ multi_student_t(nu, mu, Sigma);
}

Fit the model in Stan.

uk92_fit <- sampling(uk92_mod, data = uk92_data, chains = 1)

summary(uk92_fit, par = c("nu", "alpha", "beta", "Sigma"))$summary
#>               mean  se_mean      sd    2.5%     25%     50%      75%
#> nu          4.6265 2.29e-02 0.67580  3.5457  4.1313  4.5598  5.04690
#> alpha[1]    0.9295 2.90e-04 0.00853  0.9136  0.9238  0.9292  0.93503
#> alpha[2]    0.6090 3.85e-04 0.01138  0.5868  0.6013  0.6086  0.61684
#> beta[1,1]   0.3612 4.86e-04 0.01340  0.3334  0.3525  0.3608  0.37011
#> beta[1,2]   0.1641 5.64e-04 0.01378  0.1385  0.1542  0.1636  0.17386
#> beta[1,3]   0.0466 5.65e-04 0.01414  0.0172  0.0378  0.0469  0.05629
#> beta[1,4]  -0.0620 6.98e-04 0.01624 -0.0952 -0.0735 -0.0615 -0.05111
#> beta[1,5]  -0.0135 5.80e-04 0.01480 -0.0433 -0.0229 -0.0128 -0.00383
#> beta[2,1]  -0.0209 6.33e-04 0.01760 -0.0552 -0.0326 -0.0210 -0.00972
#> beta[2,2]   1.0396 7.07e-04 0.01873  1.0047  1.0267  1.0390  1.05279
#> beta[2,3]   0.0723 7.71e-04 0.01936  0.0323  0.0597  0.0732  0.08537
#> beta[2,4]  -0.0539 9.70e-04 0.02162 -0.0969 -0.0691 -0.0532 -0.03851
#> beta[2,5]  -0.0162 8.15e-04 0.02082 -0.0577 -0.0292 -0.0152 -0.00339
#> Sigma[1,1]  0.0320 9.75e-05 0.00287  0.0268  0.0300  0.0318  0.03383
#> Sigma[1,2]  0.0371 1.10e-04 0.00346  0.0311  0.0347  0.0369  0.03954
#> Sigma[2,1]  0.0371 1.10e-04 0.00346  0.0311  0.0347  0.0369  0.03954
#> Sigma[2,2]  0.0564 1.57e-04 0.00497  0.0476  0.0527  0.0563  0.05972
#>              97.5% n_eff  Rhat
#> nu          6.0858   868 1.001
#> alpha[1]    0.9465   863 0.999
#> alpha[2]    0.6312   873 0.999
#> beta[1,1]   0.3872   760 1.002
#> beta[1,2]   0.1920   597 1.002
#> beta[1,3]   0.0725   627 1.000
#> beta[1,4]  -0.0296   541 1.002
#> beta[1,5]   0.0165   652 1.000
#> beta[2,1]   0.0146   773 1.000
#> beta[2,2]   1.0771   702 1.003
#> beta[2,3]   0.1077   630 0.999
#> beta[2,4]  -0.0124   497 1.001
#> beta[2,5]   0.0240   652 1.000
#> Sigma[1,1]  0.0379   865 0.999
#> Sigma[1,2]  0.0441  1000 0.999
#> Sigma[2,1]  0.0441  1000 0.999
#> Sigma[2,2]  0.0667  1000 0.999

12.1 Questions

Given this model, replicate some of the results in Katz and King (1999).
Model the data using a multivariate normal model instead. How do the results differ? Which fits the data better? What does the value of \(\nu\) from the multivariate Student t model tell you about the plausibility of the multivariate normal distribution?
Tomz, Tucker, and Wittenberg (2002) suggest using seemingly unrelated regressions (SUR). Model the data with SUR. How does it compare in results and speed?
Could you model this using a multinomial model with the data provided? What data would you need?

References

Katz, Jonathan N., and Gary King. 1999. “A Statistical Model for Multiparty Electoral Data.” American Political Science Review 93 (01). Cambridge University Press (CUP): 15–32. https://doi.org/10.2307/2585758.

Tomz, Michael, Joshua A. Tucker, and Jason Wittenberg. 2002. “An Easy and Accurate Regression Model for Multiparty Electoral Data.” Political Analysis 10 (1). [Oxford University Press, Society for Political Methodology]: 66–83. http://www.jstor.org/stable/25791665.

Aitchison, J. 1982. “The Statistical Analysis of Compositional Data.” Journal of the Royal Statistical Society. Series B (Methodological) 44 (2). [Royal Statistical Society, Wiley]: 139–77. http://www.jstor.org/stable/2345821.

Juárez, Miguel A., and Mark F. J. Steel. 2010. “Model-Based Clustering of Non-Gaussian Panel Data Based on Skew-t Distributions.” Journal of Business & Economic Statistics 28 (1). Informa UK Limited: 52–66. https://doi.org/10.1198/jbes.2009.07145.

Example derived from Simon Jackman, “House of Commons elections: modeling with the multivariate t density.” BUGS Examples URL. Some language copied from the original.↩