15 Bimodal: Extreme missingness in bivariate normal data

library("rstan")
#> Loading required package: ggplot2
#> Loading required package: StanHeaders
#> rstan (Version 2.15.1, packaged: 2017-04-19 05:03:57 UTC, GitRev: 2e1f913d3ca3)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> rstan_options(auto_write = TRUE)
#> options(mc.cores = parallel::detectCores())
library("tidyverse")
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> extract(): tidyr, rstan
#> filter():  dplyr, stats
#> lag():     dplyr, stats
library("stringr")

Simple methods for dealing with missing data can run into trouble given pernicious patterns of missingness. A famous artificial data set designed to highlight this point was created by Gordon Murray, to show how an EM algorithm can run into problems (Murray 1977,Dempster, Laird, and Rubin (1977)).

x1: 1   1   -1  -1  2   2   -2 -2   *   *   *   *
x2: 1   -1  1   -1  *   *   *   *   2   2   -2  -2

Assume bivariate normality, and that the means of the two variables are both zero, but the variances and covariance are unknown. Inference about the correlation coefficient \(r\) between these two variables is not trivial in this instance. The marginal complete-data likelihood for \(r\) is not unimodal, and has a saddle-point at zero, and two local maxima close to -1 and 1. A Bayesian analysis (with uninformative priors) similarly recovers a bimodal posterior density for the correlation coefficient; e.g., (Tanner 1996, Congdon (2007)).

bimodal_mod <- stan_model("stan/bimodal.stan")
  data {
  int N;
  int N_obs;
  int N_miss;
  vector[N_obs] x_obs;
  int x_obs_idx[N_obs, 2];
  int x_miss_idx[N_miss, 2];
  vector[2] mu;
}
parameters {
  cov_matrix[2] Sigma;
  vector[N_miss] x_miss;
}
transformed parameters {
  // using an array of vectors is more convenient when sampling
  // multi_normal than using an matrix
  vector[2] X[N];
  for (i in 1:N_obs) {
    X[x_obs_idx[i, 1], x_obs_idx[i, 2]] = x_obs[i];
  }
  for (i in 1:N_miss) {
    X[x_miss_idx[i, 1], x_miss_idx[i, 2]] = x_miss[i];
  }
}
model{
  for (i in 1:N) {
    X[i] ~ multi_normal(mu, Sigma);
  }
}

You can ignore the rstan warning,

DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    X[i] ~ multi_normal(...)

since the left hand side is a simple linear relationship and no Jacobian adjustment is needed. All we did was replace index values in the transformed parameter.

X_mat <- matrix(c(1, 1, -1, -1, 2, 2, -2, -2, NA, NA, NA, NA,
       1, -1, 1, -1, NA, NA, NA, NA, 2, 2, -2, -2), ncol = 2)
X_mat <- matrix(rnorm(12), ncol = 2)
X_mat[1, 1] <- NA
X_mat[3, 2] <- NA
#       1, -1, 1, -1, NA, NA, NA, NA, 2, 2, -2, -2), ncol = 2)
X <- X_mat %>%
  as_data_frame() %>%
  mutate(.row = row_number()) %>%
  gather(.col, value, -.row) %>%
  mutate(.col = as.integer(str_replace(.col, "V", "")))

X_obs <- filter(X, !is.na(value))
X_miss <- filter(X, is.na(value))
  
bimodal_data <- within(list(), {
  N <- nrow(X)
  x_obs <- X_obs$value
  x_obs_idx <- as.matrix(X_obs[ , c(".row", ".col")])
  N_obs <- nrow(X_obs)
  x_miss_idx <- as.matrix(X_miss[ , c(".row", ".col")])
  N_miss <- nrow(X_miss)
})
bimodal_fit <- sampling(bimodal_mod, data = bimodal_data,
                        chains = 1)
#> Warning in is.na(x): is.na() applied to non-(list or vector) of type 'NULL'
#> Warning in FUN(X[[i]], ...): data with name mu is not numeric and not used

Original BUGS code and text by Simon Jackman; translated to Stan by Jeffrey Arnold.

The following example illustrates the need for caution in diagnosing convergence, and is based on an example appearing in Carlin and Louis’ Bayes and Empirical Bayes Methods for Data Analysis, 2nd edition, p174.

Consider a model for the mean as an additive sum of two parameters: e.g., \(y ~ N(q_1 + q_2, 1)\). The data are not informative about \(q_1\) and \(q_2\) , but are informative about \(m = q_1 + q_2\) and the likelihood function for the two unidentified parameters has a ridge along the locus of points \[ \left\{ q_1, q_2 : \bar{y} = q_1 + q_2 \right\} , \] where \(\bar{y}\) is the mean of the observed data.

In the Bayesian approach, we are obliged to specify priors over the model parameters. Proper priors ensure unimodal posteriors for \(q_1\) and \(q_2\), and can be used to the sample from the posterior for this problem. But as Carlin and Louis show (see their Q25, p191), we need to be careful with models of this type. The posteriors for theta are not identical to the prior (the posterior standard deviations are 7.05, while the prior standard deviations used below are 10), suggesting that the data are somewhat informative about both theta parameters, when this is not the case. An inexperienced user of Markov chain Monte Carlo methods might fail to recognize that the q parameters are not identified, and naively report the posterior summaries for theta generated by the software. On the other hand, note that the identified parameter \(m = q_1 + q_2\) is well behaved.

The original BUGS model

model{

    ## loop over observations
    for (i in 1:1){
        y[i] ~ dnorm(mu,1.0);  ## known precision
    }
    mu <- theta[1] + theta[2]

    ## priors
    #theta[1] ~ dnorm(0.0, 0.01);     ## this form is not efficient
    #theta[2] ~ dnorm(0.0, 0.01);     ## vis-a-vis en bloc approach below

    theta[1:2] ~ dmnorm(b0[],B0[,])   ## en bloc updating for theta
    b0[1] <- 0 b0[2] <- 0
    B0[1,1] <- .01 B0[2,2] <- .01
    B0[1,2] <- 0 B0[2,1] <- 0
}
library("rstan")
#> Loading required package: ggplot2
#> Loading required package: StanHeaders
#> rstan (Version 2.15.1, packaged: 2017-04-19 05:03:57 UTC, GitRev: 2e1f913d3ca3)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> rstan_options(auto_write = TRUE)
#> options(mc.cores = parallel::detectCores())
mod_unidentified <- stan_model("stan/unidentified.stan")
#> In file included from file452878d1498c.cpp:8:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/mat.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core.hpp:12:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core/var.hpp:7:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/math/tools/config.hpp:13:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/config.hpp:39:
#> /Users/jrnold/Library/R/3.4/library/BH/include/boost/config/compiler/clang.hpp:196:11: warning: 'BOOST_NO_CXX11_RVALUE_REFERENCES' macro redefined [-Wmacro-redefined]
#> #  define BOOST_NO_CXX11_RVALUE_REFERENCES
#>           ^
#> <command line>:6:9: note: previous definition is here
#> #define BOOST_NO_CXX11_RVALUE_REFERENCES 1
#>         ^
#> In file included from file452878d1498c.cpp:8:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/mat.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core.hpp:42:
#> /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core/set_zero_all_adjoints.hpp:14:17: warning: unused function 'set_zero_all_adjoints' [-Wunused-function]
#>     static void set_zero_all_adjoints() {
#>                 ^
#> In file included from file452878d1498c.cpp:8:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/mat.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core.hpp:43:
#> /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/core/set_zero_all_adjoints_nested.hpp:17:17: warning: 'static' function 'set_zero_all_adjoints_nested' declared in header file should be declared 'static inline' [-Wunneeded-internal-declaration]
#>     static void set_zero_all_adjoints_nested() {
#>                 ^
#> In file included from file452878d1498c.cpp:8:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/mat.hpp:11:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/prim/mat.hpp:59:
#> /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/prim/mat/fun/autocorrelation.hpp:17:14: warning: function 'fft_next_good_size' is not needed and will not be emitted [-Wunneeded-internal-declaration]
#>       size_t fft_next_good_size(size_t N) {
#>              ^
#> In file included from file452878d1498c.cpp:8:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math.hpp:4:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/rev/mat.hpp:11:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/prim/mat.hpp:298:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/prim/arr.hpp:39:
#> In file included from /Users/jrnold/Library/R/3.4/library/StanHeaders/include/stan/math/prim/arr/functor/integrate_ode_rk45.hpp:13:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/numeric/odeint.hpp:61:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/numeric/odeint/util/multi_array_adaption.hpp:29:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array.hpp:21:
#> In file included from /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array/base.hpp:28:
#> /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array/concept_checks.hpp:42:43: warning: unused typedef 'index_range' [-Wunused-local-typedef]
#>       typedef typename Array::index_range index_range;
#>                                           ^
#> /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array/concept_checks.hpp:43:37: warning: unused typedef 'index' [-Wunused-local-typedef]
#>       typedef typename Array::index index;
#>                                     ^
#> /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array/concept_checks.hpp:53:43: warning: unused typedef 'index_range' [-Wunused-local-typedef]
#>       typedef typename Array::index_range index_range;
#>                                           ^
#> /Users/jrnold/Library/R/3.4/library/BH/include/boost/multi_array/concept_checks.hpp:54:37: warning: unused typedef 'index' [-Wunused-local-typedef]
#>       typedef typename Array::index index;
#>                                     ^
#> 8 warnings generated.

Use very large scales for this; though the behavior is still present with weakly informative scales.

data_unidentified <- list(
  y = 0,
  theta_mean = rep(0, 2),
  theta_scale = rep(100, 2)
)
fit_unidentified <- sampling(mod_unidentified, data = data_unidentified,
                             refresh = -1)
fit_unidentified
#> Inference for Stan model: unidentified.
#> 4 chains, each with iter=2000; warmup=1000; thin=1; 
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#> 
#>           mean se_mean    sd    2.5%    25%   50%   75%  97.5% n_eff Rhat
#> theta[1]  2.14    3.17 72.33 -143.32 -42.92  1.26 48.84 144.24   520 1.01
#> theta[2] -2.13    3.17 72.34 -144.19 -48.87 -1.48 43.01 142.92   520 1.01
#> mu        0.00    0.02  0.99   -1.98  -0.67  0.03  0.68   1.94  3548 1.00
#> lp__     -1.02    0.04  1.06   -4.02  -1.38 -0.68 -0.28  -0.02   579 1.01
#> 
#> Samples were drawn using NUTS(diag_e) at Wed May 31 08:42:20 2017.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at 
#> convergence, Rhat=1).

References

Murray, Gordon. 1977. “Discussion on the Paper by Professor Dempster et Al.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1). [Royal Statistical Society, Wiley]: 27. http://www.jstor.org/stable/2984875.

Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the Em Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1). [Royal Statistical Society, Wiley]: 1–38. http://www.jstor.org/stable/2984875.

Tanner, Martin A. 1996. Tools for Statistical Inference. Springer.

Congdon, Peter. 2007. Bayesian Statistical Modelling. Wiley.