Bayesian A/B testing in Stan

A few examples of estimating A/B tests in Stan. These examples are translations of the examples in Chapter 7 of Bayesian Methods for Hackers by Cameron Davidson-Pilon.

Setup

library("rstan")
library("bayesplot")
library("tidyverse")
library("lubridate")

A/B Test

Consider the example of a website with two versions $A$ and $B$. We are interested in how the different versions of the website affect conversions.

The data comprises the number of visitors to site versions A and B, and the number of conversions to these sites.

visitors = c(A = 1300, B = 1275)
conversions = c(A = 120, B = 125)

The model of the experiment is an A/B test in which \[ \begin{aligned} n_A &\sim \mathsf{Binomial}(N_A, \pi_A) & n_B &\sim \mathsf{Binomial}(N_B, \pi_B) \\ \pi_A &= \mathsf{InvLogit}(\eta_A) & \pi_B &= \mathsf{InvLogit}(\eta_B) \\ \eta_A &= \mathsf{Normal}(0, 2.5) & \eta_B &= \mathsf{Normal}(0, 2.5) \end{aligned} \]

The following .stan model calculates the difference in probabilities between two independent samples.

data {
  int<lower=1> visitors_A;
  int<lower=0> conversions_A;
  int<lower=1> visitors_B;
  int<lower=0> conversions_B;
}
parameters {
  real eta_A;
  real eta_B;
}
transformed parameters {
  real<lower=0., upper=1.> pi_A = inv_logit(eta_A);
  real<lower=0., upper=1.> pi_B = inv_logit(eta_B);
}
model {
  eta_A ~ normal(0., 2.5);
  eta_B ~ normal(0., 2.5);
  conversions_A ~ binomial(visitors_A, pi_A);
  conversions_B ~ binomial(visitors_B, pi_B);
}
generated quantities {
  real<lower=-1., upper=1.> pi_diff;
  real eta_diff;
  real lift;

  pi_diff = pi_B - pi_A;
  eta_diff = eta_B - eta_A;
  lift = (pi_B - pi_A) / pi_B;
}

Run and fit the model.

abtest_data <- list(visitors_A = visitors["A"],
                    visitors_B = visitors["B"],
                    conversions_A = conversions["A"],
                    conversions_B = conversions["B"])

Calculate the probability that $\pi_A$

pi_A <- drop(rstan::extract(abtest_fit, "pi_A")[[1]])
pi_B <- drop(rstan::extract(abtest_fit, "pi_B")[[1]])
mean(pi_A > pi_B)

## [1] 0.344

mean(pi_A < pi_B)

## [1] 0.656

A/B Experiment With Revenue

Instead of measuring the effectiveness of the A/B test by which version gets more conversions, we can consider which type of conversion and the monetary value of each.

In this case there are four types of conversions, which result in $79, $49, $25, and $0, respectively. We thus compare the effectiveness of the experiment by which produces the most revenue.

Let $K = 4$ be the number of outcomes. Let $n_A, n_b$ be $K$ dimensional vectors of the number of visitors which fell into each of the four categories. Since $\sum_{k=1}^K n_{A,k} = N_A$, it implicitly defines the total number of visitors assigned to the $A$ version, and similarly for the $B$ version. \[ \begin{aligned} n_A &\sim \mathsf{Multinomial}(\theta_A) & n_B &\sim \mathsf{Multinomial}(\theta_B) \end{aligned} \] The parameters $\theta_A, \theta_B$ are $K$-simplexes defining the probabilities of visitors engaging in each action. These are given Dirichlet priors. \[ \begin{aligned}[t] \theta_A &\sim \mathsf{Dirichlet}(a_A) & \theta_B &\sim \mathsf{Dirichlet}(a_A) \end{aligned} \]

In Stan, this model is written as

data {
  int<lower=2> outcomes;
  int n_A[outcomes];
  int n_B[outcomes];
  vector[outcomes] payoffs;
  vector[outcomes] a_A;
  vector[outcomes] a_B;
}
parameters {
  simplex[outcomes] theta_A;
  simplex[outcomes] theta_B;
}
model {
  theta_A ~ dirichlet(a_A);
  theta_B ~ dirichlet(a_B);
  n_A ~ multinomial(theta_A);
  n_B ~ multinomial(theta_B);
}
generated quantities {
  vector[outcomes] theta_diff;
  real revenue_A;
  real revenue_B;
  real revenue_diff;
  theta_diff = theta_B - theta_A;
  revenue_A = dot_product(theta_B, payoffs) ;
  revenue_B = dot_product(theta_A, payoffs);
  revenue_diff = revenue_B - revenue_A;
}

abtest2_data <- list(
  outcomes = 4,
  payoffs = c(79, 49, 25, 0),
  n_A = c(10, 46, 80, 864),
  n_B = c(49, 84, 200, 1667),
  a_A = rep(1, 4),
  a_B = rep(1, 4)
)

posterior <- as.matrix(abtest2_fit)
mcmc_areas(posterior, pars = c("revenue_A", "revenue_B", "revenue_diff"))

mcmc_areas_ridges(posterior, pars = c("revenue_A", "revenue_B", "revenue_diff"))

mcmc_intervals(posterior, pars = c("revenue_A", "revenue_B", "revenue_diff"))

Bayesian A/B testing in Stan

Jeffrey Arnold

Setup

A/B Test

A/B Experiment With Revenue