Bayesian point estimators use the following recipe:
Let $\theta$ be a parameter with a prior distribution $\pi$. Let $L(\theta, \hat{\theta})$ be a loss function. Examples of loss-functions include
absolute error: $ | \theta - \hat{\theta} | $ |
Let $\hat{\theta}(x)$ be an estimator. The Bayes risk of $\hat{\theta}$ is the expected value of the loss function over the probability distribution of $\theta$.
An estimator is a Bayes estimator if it minimizes the the Bayes risk over all estimators.
| Estimator | Loss Function | | ————— | —————————————————————————————————————————————————————————————- | | Mean | $(\theta - \hat{\theta})^2$ | | Median | $|\theta - \hat{\theta}|$ | | $p$-Quantile | $\begin{cases} p | \theta - \hat{\theta} | & \text{for } \theta - \hat{\theta} \geq 0 \ (1 - p) |\theta - \hat{\theta} | & \text{for } \theta - \hat{\theta} < 0 \end{cases}$ | | Mode | $\begin{cases} 0 & \text{for } | \theta = \hat{\theta} | < \epsilon \ 1 | & \text{for } |\theta - \hat{\theta}| > \epsilon \end{cases}$ | \end{cases}$
The posterior mode can often be directly estimated by maximizing the posterior distribution, This is called maximum a posteriori (MAP) estimation.
Given a estimation could be used by including that loss function into that function, However, since that still requires integrating over the distribution of $\theta$. In cases where the form of $p(\theta)$ is known, this may have a closed form. In the cases of most posterior distributions, this would require some sort of approximation of the distribution of $p(\theta)$.
A $p \times 100$ credible interval of a parameter $\theta$ with distribution $f(\theta)$ is an interval $(a, b)$ such that,
The credible interval is not uniquely defined. There may be multiple intervals that satisfy the definition of a credible interval. The most common are the
equal-tailed interval: $(F^{-1}(p / 2), F^{-1}((1 - p) / 2))$ where $F^{-1}$ is the quantile function of $\theta$. The 95% credible interval would use the 2.5% and 97.5% quantiles.
highest posterior density interval: The shortest credible interval. If the distribution is not unimodal and symmetric, the HPD interval is not the same as the equal-tailed interval.
Generally it is fine to use the equal-tailed interval. This is what Stan reports by default.
The HPD
is harder to calculate.
may not be a convex interval if it is a multimodal distribution. (Though that may be a desirable feature.)
unlike the central interval, not invariant under transformation. For some function $g$, if $CI_{HPD}(\theta) = (a, b)$, then generally $CI(g(\theta)) \neq (g(a), g(b))$. However, for the equal-tailed interval, $CI(g(\theta)) = (g(a), g(b))$.
How to calculate?
For the equal-tailed credible interval:
For example, the 95% credible interval for a standard normal distribution is,
p <- 0.95
qnorm(c( (1 - p) / 2, 1 - (1 - p) / 2))
#> [1] -1.96 1.96
The 95% credible interval for a sample drawn from a normal distribution is,
quantile(rnorm(100), prob = c( (1 - p) / 2, 1 - (1 - p) / 2))
#> 2.5% 97.5%
#> -1.78 1.79
There are multiple functions in multiple packages that calculate the HPD interval.
coda::HPDitnterval
.
TODO
One aspect of Bayesian inference is that it separates inference from decision.
Estimate a posterior distribution $p(\theta | y)$ |
In this framework, inference is a subset of decisions and estimators are a subset of decision rules. Choosing an estimate is an action that aims the minimize the loss of function of guessing a parameter value.
Given $theta$ and its distribution $p(\theta)$ and a loss function $L(a, \theta)$, the optimal action $a^*$ from a set of actions $\mathcal{A}$ is If we only have a sample of size $S$ from $p(\theta)$ (as in a posterior distribution estimated by MCMC), the optimal decision would be calculated as:
The introductions of the Talking Machines episodes The Church of Bayes and Collecting Data and have a concise discussion by Neil Lawrence on the pros and cons of Bayesian decision making.