\[ \DeclareMathOperator{\E}{E} \DeclareMathOperator{\mean}{mean} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Cor}{Cor} \DeclareMathOperator{\Bias}{Bias} \DeclareMathOperator{\MSE}{MSE} \DeclareMathOperator{\RMSE}{RMSE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\se}{se} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\median}{median} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator{\logistic}{Logistic} \DeclareMathOperator{\logit}{Logit} \newcommand{\mat}[1]{\boldsymbol{#1}} \newcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\T}{'} % This follows BDA \newcommand{\dunif}{\mathsf{Uniform}} \newcommand{\dnorm}{\mathsf{Normal}} \newcommand{\dhalfnorm}{\mathrm{HalfNormal}} \newcommand{\dlnorm}{\mathsf{LogNormal}} \newcommand{\dmvnorm}{\mathsf{Normal}} \newcommand{\dgamma}{\mathsf{Gamma}} \newcommand{\dinvgamma}{\mathsf{InvGamma}} \newcommand{\dchisq}{\mathsf{ChiSquared}} \newcommand{\dinvchisq}{\mathsf{InvChiSquared}} \newcommand{\dexp}{\mathsf{Exponential}} \newcommand{\dlaplace}{\mathsf{Laplace}} \newcommand{\dweibull}{\mathsf{Weibull}} \newcommand{\dwishart}{\mathsf{Wishart}} \newcommand{\dinvwishart}{\mathsf{InvWishart}} \newcommand{\dlkj}{\mathsf{LkjCorr}} \newcommand{\dt}{\mathsf{StudentT}} \newcommand{\dhalft}{\mathsf{HalfStudentT}} \newcommand{\dbeta}{\mathsf{Beta}} \newcommand{\ddirichlet}{\mathsf{Dirichlet}} \newcommand{\dlogistic}{\mathsf{Logistic}} \newcommand{\dllogistic}{\mathsf{LogLogistic}} \newcommand{\dpois}{\mathsf{Poisson}} \newcommand{\dBinom}{\mathsf{Binomial}} \newcommand{\dmultinom}{\mathsf{Multinom}} \newcommand{\dnbinom}{\mathsf{NegativeBinomial}} \newcommand{\dnbinomalt}{\mathsf{NegativeBinomial2}} \newcommand{\dbetabinom}{\mathsf{BetaBinomial}} \newcommand{\dcauchy}{\mathsf{Cauchy}} \newcommand{\dhalfcauchy}{\mathsf{HalfCauchy}} \newcommand{\dbernoulli}{\mathsf{Bernoulli}} \newcommand{\R}{\mathbb{R}} \newcommand{\Reals}{\R} \newcommand{\RealPos}{\R^{+}} \newcommand{\N}{\mathbb{N}} \newcommand{\Nats}{\N} \newcommand{\cia}{\perp\!\!\!\perp} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator{\invlogit}{Inv-Logit} \DeclareMathOperator{\logit}{Logit} \DeclareMathOperator{\diag}{diag} \]

12 Generalized Linear Models

Prerequisites

library("rstan")
library("tidyverse")

12.1 Introduction

Generalized linear models (GLMs) are a class of commonly used models. In GLMs, the mean is specified as a function of a linear model of predictors, \[ E(Y) = \mu = g^{-1}(\mat{X} \vec{\beta}) . \] GLMs are a generalization of linear regression from an unbounded continuous outcome variable to other types of data: binary, count, categorical, bounded continuous.

A GLM consists of three components:

A probability distribution (family) specifying the conditional distribution of the response variable. In GLMs, the distribution is in the exponential family: Normal, Binomial, Poisson, Categorical, Multinomial, Poisson, Beta.
A linear predictor, which is a linear function of the predictors, \[ \eta = \mat{X} \vec{\beta}. \]
A link function ($g(.)$) which maps the expected value to the the linear predictor, \[ g(\mu) = \eta . \] The link function is smooth and invertible, and the inverse link function or mean function maps the linear predictor to the mean, \[ \mu = g^{-1}(\eta) . \] The link function ($g$) and its inverse ($g^{-1}) translate $\eta$ from $(\-infty, +\infty)$ to the proper range for the probability distribution and back again.

These models are often estimated with MLE, as with the function stats. These are also easily estimated in a Bayesian setting.

See the help for stats for common probability distributions, stats for common links, and the Wikipedia page for a table of common GLMs. See the function VGAM for even more examples of link functions and probability distributions.

12.2 Count Models

12.2.1 Poisson

The Poisson model is used for unbounded count data, \[ Y = 0, 1, \dots, \infty \] The outcome is modeled as a Poisson distribution \[ y_i \sim \dpois(\lambda_i) \] with positive mean parameter $\lambda_i \in (0, \infty)$. Since $\lambda_i$ has to be positive, the most common link function is the log, \[ \log(\lambda_i) = \exp(\vec{x}_i' \vec{\beta}) \] which has the inverse, \[ \lambda_i = \log(\vec{x}_i \vec{\beta}) \]

In Stan, the Poisson distribution has two implementations:

poisson_lpdf
poisson_log_lpdf: Poisson with a log link. This is for numeric stability.

Also, rstanarm supports the Poisson.

12.3 Example

A regression model of bilateral sanctions for the period 1939 to 1983. The outcome variable is the number of countries imposing sanctions.

data("sanction", package = "Zelig")

TODO

12.4 Negative Binomial

The Negative Binomial model is also used for unbounded count data, \[ Y = 0, 1, \dots, \infty \] The Poisson distribution has the restriction that the mean is equal to the variance, $\E(X) = \Var(X) = \lambda$. The Negative Binomial distribution has an additional parameter that allows the variance to vary (though it is always larger than the mean).

The outcome is modeled as a negative binomial distribution, \[ y_i \sim \dBinom(\alpha_i, \beta) \] with shape $\alpha \in \R^{+}$ and inverse scale $\beta \in \R^{+}$, and $\E(y) = \alpha_i / \beta$ and $\Var(Y) = \frac{\alpha_i}{\beta^2}(\beta + 1)$. Then the mean can be modeled and transformed to the \[ \begin{aligned}[t] \mu_i &= \log( \vec{x}_i \vec{\gamma} ) \\ \alpha_i &= \mu_i / \beta \end{aligned} \]

Important The negative binomial distribution has many different parameterizations. An alternative parameterization of the negative binomial uses the mean and a over-dispersion parameter. \[ y_i \sim \dnbinomalt(\mu_i, \phi) \] with location parameter $\mu \in \R^{+}$ and over-dispersion parameter $\phi \in \R^{+}$, and $\E(y) = \mu_i$ and $\Var(Y) = \mu_i + \frac{\mu_i^2}{\phi}$. Then the mean can be modeled and transformed to the \[ \begin{aligned}[t] \mu_i &= \log( \vec{x}_i \vec{\gamma} ) \\ \end{aligned} \]

In Stan, there are multiple parameterizations of the

neg_binomial_lpdf(y | alpha, beta)with shape parameter alpha and inverse scale parameter beta.
neg_binomial_2_lpdf(y | mu, phi) with mean mu and over-dispersion parameter phi.
neg_binomial_2_log_lpdf(y | eta, phi) with log-mean eta and over-dispersion parameter phi

Also, rstanarm supports Poisson and negative binomial models.

A. Gelman, Carlin, et al. (2013 Ch 16)

12.5 Multinomial / Categorical Models

12.6 Gamma Regression

The response variable is continuous and positive. In gamma regression, the coefficient of variation is constant rather than the variance. \[ y_i \sim \dgamma(\alpha_i, \beta) \] and \[ \begin{aligned}[t] \alpha_i &= \mu_i / \beta \\ \mu_i &= \vec{x}_i \vec{\gamma} \end{aligned} \]

In Stan,

gamma(y | alpha, beta) with shape parameter $\alpha > 0$ and inverse scale parameter $\beta > 0$. Then $\E(Y) = \alpha / \beta$ and $\Var(Y) = \alpha / \beta^2$.

12.7 Beta Regression

This is for a response variable that is a proportion, $y_i \in (0, 1)$, \[ y_i \sim \dbeta(\alpha_i, \beta_i) \] and \[ \begin{aligned}[t] \mu_i &= g^{-1}(\vec{x}_i' \vec{\gamma}) \\ \alpha_i &= \mu_i \phi \\ \beta_i &= (1 - \mu_i) \phi \end{aligned} \] Additionally, the $\phi$ parameter could also be modeled.

In Stan:

beta(y | alpha, beta) with positive prior successes plus one, $\alpha > 0$, and negative prior failures plus one, $\beta > 0$. Then $\E(Y) = \alpha / (\alpha + \beta)$ and $\Var(Y) = \alpha\beta / ((\alpha + \beta)^2 (\alpha + \beta + 1))$.

rstanarm function rstasnarm

See:

Ferrari and Cribari-Neto (2004), Cribari-Neto and Zeileis (2010), and Grün, Kosmidis, and Zeileis (2012) on beta regression.
rstanarm documentation Modeling Rates/Proportions using Beta Regression with rstanarm

12.8 References

For general references on count models see

A. Gelman and Hill (2007, 109–16)
McElreath (2016 Ch 10)
Fox (2016 Ch. 14)
A. Gelman, Carlin, et al. (2013 Ch. 16)

A. Gelman, Carlin, et al. (2013 Ch 16), A. Gelman and Hill (2007 Ch. 5-6), McElreath (2016 Ch. 9). King (1998) discusses MLE estimation of many common GLM models.

Many econometrics/statistics textbooks, e.g. Fox (2016), discuss GLMs. Though they are not derived from a Bayesian context, they can easily transferred.