17 Engines: right-censored failure times

library("tidyverse")
library("rstan")

17.1 Data

The data are 40 engines tested at various operating temperatures, with the the failure time if the engine failed, or the last time of the observational period if it had not (Tanner 1996).¹⁴ Of the 40 engines, 23 did not fail in their observational periods.

data("engines", package = "bayesjackman")
glimpse(engines)
#> Observations: 40
#> Variables: 3
#> $ y        <dbl> 3.25, 3.44, 3.54, 3.55, 3.58, 3.69, 3.72, 2.61, 2.61,...
#> $ x        <dbl> 2.26, 2.26, 2.26, 2.26, 2.26, 2.26, 2.26, 2.16, 2.16,...
#> $ censored <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS...

17.2 Model

Let \(y^*\) be the failure time of engine \(i\). The failure times are modeled as a regression with normal errors, \[ \begin{aligned}[t] y^*_i &\sim \mathsf{Normal}(\mu_i, \sigma) , \\ \mu_i &= \alpha + \beta x_i . \end{aligned} \] However, the failure times are not always observed. In some cases, only the last observation time is known, meaning that all is known is \(y^*_i > y_i\). Let \(L\) be the set of censored observation. \[ \begin{aligned}[t] y_i &\sim \mathsf{Normal}(\mu_i, \sigma) & i \notin L, \\ y^*_i &\sim \mathsf{Normal}(\mu_i, \sigma) U(y_i, \infty) & i \in L, \\ \mu_i &= \alpha + \beta x_i . \end{aligned} \]

\[ \begin{aligned}[t] \log L(y_i, \dots, y_N | \alpha, \beta, \sigma) &= \sum_{i \notin L} \log \mathsf{Normal}(y_i; \mu_i, \Sigma) \\ &\quad + \sum_{i \in L} \log \int_{y_i}^{\infty} \mathsf{Normal}(y^*; \mu_i, \Sigma) d\,y^* , \end{aligned} \] where \[ \mu_i = \alpha + \beta x . \]

mod_engines <- stan_model("stan/engines.stan")

mod_engines

  data {
  // number of observations
  int N;
  // observed data
  int N_obs;
  vector[N_obs] y_obs;
  // censored data
  int N_cens;
  vector[N_cens] y_cens;
  // covariates
  int K;
  matrix[N_obs, K] X_obs;
  matrix[N_cens, K] X_cens;
  // priors
  real alpha_loc;
  real alpha_scale;
  vector[K] beta_loc;
  vector[K] beta_scale;
  real sigma_scale;
}
parameters {
  real alpha;
  vector[K] beta;
  real sigma;
}
transformed parameters {
  vector[N_obs] mu_obs;
  vector[N_cens] mu_cens;
  mu_obs = alpha + X_obs * beta;
  mu_cens = alpha + X_cens * beta;
}
model {
  sigma ~ cauchy(0, sigma_scale);
  alpha ~ normal(alpha_loc, alpha_scale);
  beta ~ normal(beta_loc, beta_scale);
  y_obs ~ normal(mu_obs, sigma);
  target += normal_lccdf(y_cens | mu_cens, sigma);
}

17.3 Estimation

For the input data to the Stan model, the observations that are observed and censored have to be provided in separate vectors.

X <- scale(engines$x)

engines_data <- within(list(), {
  N <- nrow(engines)
  # observed obs
  y_obs <- engines$y[!engines$censored]
  N_obs <- length(y_obs)
  X_obs <- X[!engines$censored, , drop = FALSE]
  K <- ncol(X_obs)
  # censored obs
  y_cens <- engines$y[engines$censored]  
  N_cens <- length(y_cens)
  X_cens <- X[engines$censored, , drop = FALSE]
  # priors
  # use the mean and sd of y to roughly scale the weakly informative
  # priors -- these don't account for  need to exact
  alpha_loc <- mean(engines$y)
  alpha_scale <- 10 * sd(engines$y)
  beta_loc <- array(0)
  beta_scale <- array(2.5 * sd(engines$y))
  sigma_scale <- 5 * sd(y_obs)
})

sampling(mod_engines, data = engines_data,
         chains = 1, init = list(list(alpha = mean(engines$y))))
#> 
#> SAMPLING FOR MODEL 'engines' NOW (CHAIN 1).
#> 
#> Gradient evaluation took 4.1e-05 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds.
#> Adjust your expectations accordingly!
#> 
#> 
#> Iteration:    1 / 2000 [  0%]  (Warmup)
#> Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Iteration: 2000 / 2000 [100%]  (Sampling)
#> 
#>  Elapsed Time: 0.066719 seconds (Warm-up)
#>                0.063184 seconds (Sampling)
#>                0.129903 seconds (Total)
#> Inference for Stan model: engines.
#> 1 chains, each with iter=2000; warmup=1000; thin=1; 
#> post-warmup draws per chain=1000, total post-warmup draws=1000.
#> 
#>              mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
#> alpha        3.50    0.00 0.08  3.38  3.45  3.49 3.54  3.68   328 1.00
#> beta[1]      0.55    0.00 0.07  0.43  0.51  0.55 0.59  0.69   424 1.00
#> sigma        0.31    0.00 0.07  0.21  0.26  0.30 0.34  0.48   279 1.01
#> mu_obs[1]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[2]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[3]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[4]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[5]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[6]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[7]    3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_obs[8]    3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_obs[9]    3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_obs[10]   3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_obs[11]   3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_obs[12]   3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_obs[13]   2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_obs[14]   2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_obs[15]   2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_obs[16]   2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_obs[17]   2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_cens[1]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[2]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[3]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[4]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[5]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[6]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[7]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[8]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[9]   4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[10]  4.21    0.01 0.13  4.01  4.13  4.20 4.28  4.49   277 1.00
#> mu_cens[11]  3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_cens[12]  3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_cens[13]  3.74    0.01 0.09  3.60  3.69  3.74 3.79  3.94   283 1.00
#> mu_cens[14]  3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_cens[15]  3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_cens[16]  3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_cens[17]  3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_cens[18]  3.31    0.00 0.07  3.20  3.27  3.31 3.36  3.48   423 1.00
#> mu_cens[19]  2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_cens[20]  2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_cens[21]  2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_cens[22]  2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> mu_cens[23]  2.74    0.00 0.10  2.55  2.67  2.73 2.80  2.95   876 1.00
#> lp__        -0.42    0.08 1.38 -3.95 -0.98 -0.08 0.57  1.08   307 1.00
#> 
#> Samples were drawn using NUTS(diag_e) at Fri Apr 20 22:05:47 2018.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at 
#> convergence, Rhat=1).

References

Tanner, Martin A. 1996. Tools for Statistical Inference. Springer.

This example is derived from Simon Jackman, “Engines: right-censored failure times - the I(,) construct contrasted with other approaches”, 2007-07-24, URL ↩