bayesian_notes

Estimation

Point Estimates

Bayesian point estimators use the following recipe:

  1. define a loss function that penalizes guesses
  2. take the expected value of that loss function over the parameter of interest

Let $\theta$ be a parameter with a prior distribution $\pi$. Let $L(\theta, \hat{\theta})$ be a loss function. Examples of loss-functions include

Let $\hat{\theta}(x)$ be an estimator. The Bayes risk of $\hat{\theta}$ is the expected value of the loss function over the probability distribution of $\theta$.

An estimator is a Bayes estimator if it minimizes the the Bayes risk over all estimators.

| Estimator | Loss Function | | ————— | —————————————————————————————————————————————————————————————- | | Mean | $(\theta - \hat{\theta})^2$ | | Median | $|\theta - \hat{\theta}|$ | | $p$-Quantile | $\begin{cases} p | \theta - \hat{\theta} | & \text{for } \theta - \hat{\theta} \geq 0 \ (1 - p) |\theta - \hat{\theta} | & \text{for } \theta - \hat{\theta} < 0 \end{cases}$ | | Mode | $\begin{cases} 0 & \text{for } | \theta = \hat{\theta} | < \epsilon \ 1 | & \text{for } |\theta - \hat{\theta}| > \epsilon \end{cases}$ | \end{cases}$

The posterior mode can often be directly estimated by maximizing the posterior distribution, This is called maximum a posteriori (MAP) estimation.

Given a estimation could be used by including that loss function into that function, However, since that still requires integrating over the distribution of $\theta$. In cases where the form of $p(\theta)$ is known, this may have a closed form. In the cases of most posterior distributions, this would require some sort of approximation of the distribution of $p(\theta)$.

Credible Intervals

A $p \times 100$ credible interval of a parameter $\theta$ with distribution $f(\theta)$ is an interval $(a, b)$ such that,

The credible interval is not uniquely defined. There may be multiple intervals that satisfy the definition of a credible interval. The most common are the

Generally it is fine to use the equal-tailed interval. This is what Stan reports by default.

The HPD

How to calculate?

For the equal-tailed credible interval:

For example, the 95% credible interval for a standard normal distribution is,

p <- 0.95
qnorm(c( (1 - p) / 2, 1 - (1 - p) / 2))
#> [1] -1.96  1.96

The 95% credible interval for a sample drawn from a normal distribution is,

quantile(rnorm(100), prob = c( (1 - p) / 2, 1 - (1 - p) / 2))
#>  2.5% 97.5% 
#> -1.78  1.79

There are multiple functions in multiple packages that calculate the HPD interval. coda::HPDitnterval.

Compared to confidence intervals

TODO

Bayesian Decision Theory

One aspect of Bayesian inference is that it separates inference from decision.

  1. Estimate a posterior distribution $p(\theta y)$
  2. Define a loss function for an action ($a$) and parameter ($\theta$), $L(a, \theta)$.
  3. Choose that action that minimizes the loss function

In this framework, inference is a subset of decisions and estimators are a subset of decision rules. Choosing an estimate is an action that aims the minimize the loss of function of guessing a parameter value.

Given $theta$ and its distribution $p(\theta)$ and a loss function $L(a, \theta)$, the optimal action $a^*$ from a set of actions $\mathcal{A}$ is If we only have a sample of size $S$ from $p(\theta)$ (as in a posterior distribution estimated by MCMC), the optimal decision would be calculated as:

The introductions of the Talking Machines episodes The Church of Bayes and Collecting Data and have a concise discussion by Neil Lawrence on the pros and cons of Bayesian decision making.