\[ \DeclareMathOperator{\E}{E} \DeclareMathOperator{\mean}{mean} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Cor}{Cor} \DeclareMathOperator{\Bias}{Bias} \DeclareMathOperator{\MSE}{MSE} \DeclareMathOperator{\RMSE}{RMSE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\se}{se} \DeclareMathOperator{\rank}{rank} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \newcommand{\Mat}[1]{\boldsymbol{#1}} \newcommand{\Vec}[1]{\boldsymbol{#1}} \newcommand{\T}{'} \newcommand{\distr}[1]{\mathcal{#1}} \newcommand{\dnorm}{\distr{N}} \newcommand{\dmvnorm}[1]{\distr{N}_{#1}} \newcommand{\dt}[1]{\distr{T}_{#1}} \newcommand{\cia}{\perp\!\!\!\perp} \DeclareMathOperator*{\plim}{plim} \]

Chapter 12 Formatting Tables

12.1 Overview of Packages

R has multiple packages and functions for directly producing formatted tables for LaTeX, HTML, and other output formats. Given the

See the Reproducible Research Task View for an overview of various options.

xtable is a general purpose package for creating LaTeX, HTML, or plain text tables in R.
texreg is more specifically geared to regression tables. It also outputs results in LaTeX (texreg), HTML (texreg), and plain text.

The packages stargazer and apsrtable are other popular packages for formatting regression output. However, they are less-well maintained and have less functionality than texreg. For example, apsrtable hasn’t been updated since 2012, stargazer since 2015.

The texreg vignette is a good introduction to texreg, and also discusses the These blog posts by Will Lowe cover many of the options.

Additionally, for simple tables, knitr, the package which provides the heavy lifting for R markdown, has a function knitr. knitr also has the ability to customize how R objects are printed with the knit_print function.

Other notable packages are:

pander creates output in markdown for export to other formats.
tables uses a formula syntax to define tables
ReportR has the most complete support for creating Word documents, but is likely too much.

For a political science perspective on why automating the research process is important see:

Nicholas Eubank Embrace Your Fallibility: Thoughts on Code Integrity, based on this article
Matthew Gentzkow Jesse M. Shapiro.Code and Data for the Social Sciences: A Practitioner’s Guide. March 10, 2014.
Political Methodologist issue on Workflow Management

12.2 Summary Statistic Table Example

The xtable package has methods to convert many types of R objects to tables.

library("gapminder")

gapminder_summary <-
  gapminder %>%
  # Keep numeric variables
  select_if(is.numeric) %>%
  # gather variables
  gather(variable, value) %>%
  # Summarize by variable
  group_by(variable) %>%
  # summarise all columns
  summarise(n = sum(!is.na(value)),
            `Mean` = mean(value),
            `Std. Dev.` = sd(value),
            `Median` = median(value),
            `Min.` = min(value),
            `Max.` = max(value))
gapminder_summary

## # A tibble: 4 x 7
##   variable      n       Mean `Std. Dev.`    Median    Min.         Max.
##   <chr>     <int>      <dbl>       <dbl>     <dbl>   <dbl>        <dbl>
## 1 gdpPercap  1704     7215.       9857.     3532.    241.      113523. 
## 2 lifeExp    1704       59.5        12.9      60.7    23.6         82.6
## 3 pop        1704 29601212.  106157897.  7023596.  60011.  1318683096. 
## 4 year       1704     1980.         17.3    1980.   1952.        2007.

Now that we have a data frame with the table we want, use xtable to create it:

library("xtable")
foo <- xtable(gapminder_summary, digits = 0) %>%
  print(type = "html",
        html.table.attributes = "",
        include.rownames = FALSE,
        format.args = list(big.mark = ","))

variable	n	Mean	Std. Dev.	Median	Min.	Max.
gdpPercap	1,704	7,215	9,857	3,532	241	113,523
lifeExp	1,704	59	13	61	24	83
pop	1,704	29,601,212	106,157,897	7,023,596	60,011	1,318,683,096
year	1,704	1,980	17	1,980	1,952	2,007

Note that there we two functions to get HTML. The function xtable creates an xtable R object, and the function xtable (called as print()), which prints the xtable object as HTML (or LaTeX). The default HTML does not look nice, and would need to be formatted with CSS. If you are copy and pasting it into Word, you would do some post-processing cleanup anyways.

Another alternative is the knitr function in the knitr package, which outputs R markdown tables.

knitr::kable(gapminder_summary)

variable	n	Mean	Std. Dev.	Median	Min.	Max.
gdpPercap	1704	7.215327e+03	9.857455e+03	3531.8470	241.1659	1.135231e+05
lifeExp	1704	5.947444e+01	1.291711e+01	60.7125	23.5990	8.260300e+01
pop	1704	2.960121e+07	1.061579e+08	7023595.5000	60011.0000	1.318683e+09
year	1704	1.979500e+03	1.726533e+01	1979.5000	1952.0000	2.007000e+03

This is useful for producing quick tables.

Finally, htmlTables package unsurprisingly produces HTML tables.

library("htmlTable")
htmlTable(txtRound(gapminder_summary, 0),
          align = "lrrrr")

	variable	n	Mean	Std. Dev.	Median	Min.	Max.
1	gdpPercap	1704	7	10	3532	241	1
2	lifeExp	1704	6	1	61	24	8
3	pop	1704	3	1	7023596	60011	1
4	year	1704	2	2	1980	1952	2

It has more features for producing HTML tables than xtable, but does not output LaTeX.

12.3 Regression Table Example

library("tidyverse")
library("texreg")

We will run several regression models with the Duncan data

data("Duncan", package = "carData")

Since I’m running several regressions, I will save them to a list. If you know that you will be creating multiple objects, and programming with them, always put them in a list.

First, create a list of the regression formulas,

formulae <- list(
  prestige ~ type,
  prestige ~ income,
  prestige ~ education,
  prestige ~ type + education + income
)

Write a function to run a single model, Now use map to run a regression with each of these formulae, and save them to a list,

prestige_mods <- map(formulae, ~ lm(.x, data = Duncan, model = FALSE))

This is a list of lm objects,

map(prestige_mods, class)

## [[1]]
## [1] "lm"
## 
## [[2]]
## [1] "lm"
## 
## [[3]]
## [1] "lm"
## 
## [[4]]
## [1] "lm"

We can look at the first model,

prestige_mods[[1]]

## 
## Call:
## lm(formula = .x, data = Duncan, model = FALSE)
## 
## Coefficients:
## (Intercept)     typeprof       typewc  
##       22.76        57.68        13.90

Now we can format the regression table in HTML using htmlreg. The first argument of htmlreg is a list of models:

htmlreg(prestige_mods)

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”>

Statistical models
	Model 1	Model 2	Model 3	Model 4
(Intercept)	22.76^***	2.46	0.28	-0.19
	(3.47)	(5.19)	(5.09)	(3.71)
typeprof	57.68^***			16.66^*
	(5.10)			(6.99)
typewc	13.90			-14.66^*
	(7.35)			(6.11)
income		1.08^***		0.60^***
		(0.11)		(0.09)
education			0.90^***	0.35^**
			(0.08)	(0.11)
R²	0.76	0.70	0.73	0.91
Adj. R²	0.75	0.69	0.72	0.90
Num. obs.	45	45	45	45
RMSE	15.88	17.40	16.69	9.74
p < 0.001, p < 0.01, p < 0.05

By default, htmlreg() prints out HTML, which is exactly what I want in an R markdown document. To save the output to a file, specify a non-null file argument. For example, to save the table to the file prestige.html,

htmlreg(prestige_mods, file = "prestige.html")

Since this function outputs HTML directly to the console, it can be hard to tell what’s going on. If you want to preview the table in RStudio while working on it, this snippet of code uses htmltools package to do so:

library("htmltools")
htmlreg(prestige_mods) %>% HTML() %>% browsable()

The htmlreg function has many options to adjust the table formatting. Below, I clean up the table.

I remove stars using stars = NULL. It is a growing convention to avoid the use of stars indicating significance in regression tables (see AJPS and Political Analysis guidelines).
The arguments doctype, html.tag, head.tag, body.tag control what sort of HTML is created. Generally all these functions (whether LaTeX or HTML output) have some arguments that determine whether it is creating a standalone, complete document, or a fragment that will be copied into another document.
The arguments include.rsquared, include.adjrs, and include.nobs are passed to the function extract() which determines what information the texreg package extracts from a model to put into the table. I get rid of \(R^2\), but keep adjusted \(R^2\), and the number of observations.

library("stringr")
coefnames <- c("Professional",
               "Working Class",
               "Income",
               "Education")
note <- "OLS regressions with prestige as the response variable."
htmlreg(prestige_mods, stars = NULL,
        custom.model.names = str_c("(", seq_along(prestige_mods), ")"),
        omit.coef = "\\(Intercept\\)",
        custom.coef.names = coefnames,
        custom.note = str_c("Note: ", note),
        caption.above = TRUE,
        caption = "Regressions of Occupational Prestige",
        # better for markdown
        doctype = FALSE,
        html.tag = FALSE,
        head.tag = FALSE,
        body.tag = FALSE,
        # passed to extract() method for "lm"
        include.adjr = TRUE,
        include.rsquared = FALSE,
        include.rmse = FALSE,
        include.nobs = TRUE)

Regressions of Occupational Prestige
	(1)	(2)	(3)	(4)
Professional	57.68			16.66
	(5.10)			(6.99)
Working Class	13.90			-14.66
	(7.35)			(6.11)
Income		1.08		0.60
		(0.11)		(0.09)
Education			0.90	0.35
			(0.08)	(0.11)
Adj. R²	0.75	0.69	0.72	0.90
Num. obs.	45	45	45	45
Note: OLS regressions with prestige as the response variable.

Once you find a set of options that are common across your tables, make a function so you do not need to retype them.

my_reg_table <- function(mods, ..., note = NULL) {
  htmlreg(mods,
          stars = NULL,
          custom.note = if (!is.null(note)) str_c("Note: ", note) else NULL,
          caption.above = TRUE,
          # better for markdown
          doctype = FALSE,
          html.tag = FALSE,
          head.tag = FALSE)
}
my_reg_table(prestige_mods,
            custom.model.names = str_c("(", seq_along(prestige_mods), ")"),
            custom.coef.names = coefnames,
            note = note,
            # put intercept at the bottom
            reorder.coef = c(2, 3, 4, 5, 1),
            caption = "Regressions of Occupational Prestige")

Statistical models
	Model 1	Model 2	Model 3	Model 4
(Intercept)	22.76	2.46	0.28	-0.19
	(3.47)	(5.19)	(5.09)	(3.71)
typeprof	57.68			16.66
	(5.10)			(6.99)
typewc	13.90			-14.66
	(7.35)			(6.11)
income		1.08		0.60
		(0.11)		(0.09)
education			0.90	0.35
			(0.08)	(0.11)
R²	0.76	0.70	0.73	0.91
Adj. R²	0.75	0.69	0.72	0.90
Num. obs.	45	45	45	45
RMSE	15.88	17.40	16.69	9.74
Note: OLS regressions with prestige as the response variable.

Note that I didn’t include every option in my_reg_table, only those arguments that will be common across tables. I use ... to pass arguments to htmlreg. Then when I call my_reg_table the only arguments are those specific to the content of the table, not the formatting, making it easier to understand what each table is saying.

Of course, texreg also produces LaTeX output, with the function texreg. Almost all the options are the same as htmlreg.