Chapter 12 Formatting Tables
12.1 Overview of Packages
R has multiple packages and functions for directly producing formatted tables for LaTeX, HTML, and other output formats. Given the
See the Reproducible Research Task View for an overview of various options.
xtable is a general purpose package for creating LaTeX, HTML, or plain text tables in R.
texreg is more specifically geared to regression tables. It also outputs results in LaTeX (texreg), HTML (texreg), and plain text.
The packages stargazer and apsrtable are other popular packages for formatting regression output. However, they are less-well maintained and have less functionality than texreg. For example, apsrtable hasn’t been updated since 2012, stargazer since 2015.
The texreg vignette is a good introduction to texreg, and also discusses the These blog posts by Will Lowe cover many of the options.
Additionally, for simple tables, knitr, the package which provides the heavy lifting for R markdown, has a function knitr. knitr also has the ability to customize how R objects are printed with the knit_print function.
Other notable packages are:
- pander creates output in markdown for export to other formats.
- tables uses a formula syntax to define tables
- ReportR has the most complete support for creating Word documents, but is likely too much.
For a political science perspective on why automating the research process is important see:
Nicholas Eubank Embrace Your Fallibility: Thoughts on Code Integrity, based on this article
Matthew Gentzkow Jesse M. Shapiro.Code and Data for the Social Sciences: A Practitioner’s Guide. March 10, 2014.
Political Methodologist issue on Workflow Management
12.2 Summary Statistic Table Example
The xtable
package has methods to convert many types of R objects to tables.
library("gapminder")
gapminder_summary <-
gapminder %>%
# Keep numeric variables
select_if(is.numeric) %>%
# gather variables
gather(variable, value) %>%
# Summarize by variable
group_by(variable) %>%
# summarise all columns
summarise(n = sum(!is.na(value)),
`Mean` = mean(value),
`Std. Dev.` = sd(value),
`Median` = median(value),
`Min.` = min(value),
`Max.` = max(value))
gapminder_summary
## # A tibble: 4 x 7
## variable n Mean `Std. Dev.` Median Min. Max.
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gdpPercap 1704 7215. 9857. 3532. 241. 113523.
## 2 lifeExp 1704 59.5 12.9 60.7 23.6 82.6
## 3 pop 1704 29601212. 106157897. 7023596. 60011. 1318683096.
## 4 year 1704 1980. 17.3 1980. 1952. 2007.
Now that we have a data frame with the table we want, use xtable
to create
it:
library("xtable")
foo <- xtable(gapminder_summary, digits = 0) %>%
print(type = "html",
html.table.attributes = "",
include.rownames = FALSE,
format.args = list(big.mark = ","))
variable | n | Mean | Std. Dev. | Median | Min. | Max. |
---|---|---|---|---|---|---|
gdpPercap | 1,704 | 7,215 | 9,857 | 3,532 | 241 | 113,523 |
lifeExp | 1,704 | 59 | 13 | 61 | 24 | 83 |
pop | 1,704 | 29,601,212 | 106,157,897 | 7,023,596 | 60,011 | 1,318,683,096 |
year | 1,704 | 1,980 | 17 | 1,980 | 1,952 | 2,007 |
Note that there we two functions to get HTML. The function xtable
creates
an xtable
R object, and the function xtable (called as print()
), which prints the xtable
object as HTML (or LaTeX).
The default HTML does not look nice, and would need to be formatted with CSS.
If you are copy and pasting it into Word, you would do some post-processing cleanup anyways.
Another alternative is the knitr function in the knitr package, which outputs R markdown tables.
variable | n | Mean | Std. Dev. | Median | Min. | Max. |
---|---|---|---|---|---|---|
gdpPercap | 1704 | 7.215327e+03 | 9.857455e+03 | 3531.8470 | 241.1659 | 1.135231e+05 |
lifeExp | 1704 | 5.947444e+01 | 1.291711e+01 | 60.7125 | 23.5990 | 8.260300e+01 |
pop | 1704 | 2.960121e+07 | 1.061579e+08 | 7023595.5000 | 60011.0000 | 1.318683e+09 |
year | 1704 | 1.979500e+03 | 1.726533e+01 | 1979.5000 | 1952.0000 | 2.007000e+03 |
This is useful for producing quick tables.
Finally, htmlTables package unsurprisingly produces HTML tables.
variable | n | Mean | Std. Dev. | Median | Min. | Max. | |
---|---|---|---|---|---|---|---|
1 | gdpPercap | 1704 | 7 | 10 | 3532 | 241 | 1 |
2 | lifeExp | 1704 | 6 | 1 | 61 | 24 | 8 |
3 | pop | 1704 | 3 | 1 | 7023596 | 60011 | 1 |
4 | year | 1704 | 2 | 2 | 1980 | 1952 | 2 |
It has more features for producing HTML tables than xtable
, but does not output LaTeX.
12.3 Regression Table Example
We will run several regression models with the Duncan data
Since I’m running several regressions, I will save them to a list. If you know that you will be creating multiple objects, and programming with them, always put them in a list.
First, create a list of the regression formulas,
formulae <- list(
prestige ~ type,
prestige ~ income,
prestige ~ education,
prestige ~ type + education + income
)
Write a function to run a single model,
Now use map
to run a regression with each of these formulae,
and save them to a list,
This is a list of lm
objects,
## [[1]]
## [1] "lm"
##
## [[2]]
## [1] "lm"
##
## [[3]]
## [1] "lm"
##
## [[4]]
## [1] "lm"
We can look at the first model,
##
## Call:
## lm(formula = .x, data = Duncan, model = FALSE)
##
## Coefficients:
## (Intercept) typeprof typewc
## 22.76 57.68 13.90
Now we can format the regression table in HTML using htmlreg
.
The first argument of htmlreg
is a list of models:
Model 1 | Model 2 | Model 3 | Model 4 | ||
---|---|---|---|---|---|
(Intercept) | 22.76*** | 2.46 | 0.28 | -0.19 | |
(3.47) | (5.19) | (5.09) | (3.71) | ||
typeprof | 57.68*** | 16.66* | |||
(5.10) | (6.99) | ||||
typewc | 13.90 | -14.66* | |||
(7.35) | (6.11) | ||||
income | 1.08*** | 0.60*** | |||
(0.11) | (0.09) | ||||
education | 0.90*** | 0.35** | |||
(0.08) | (0.11) | ||||
R2 | 0.76 | 0.70 | 0.73 | 0.91 | |
Adj. R2 | 0.75 | 0.69 | 0.72 | 0.90 | |
Num. obs. | 45 | 45 | 45 | 45 | |
RMSE | 15.88 | 17.40 | 16.69 | 9.74 | |
p < 0.001, p < 0.01, p < 0.05 |
By default, htmlreg()
prints out HTML, which is exactly what I want in an R markdown document.
To save the output to a file, specify a non-null file
argument.
For example, to save the table to the file prestige.html
,
Since this function outputs HTML directly to the console, it can be hard to tell what’s going on. If you want to preview the table in RStudio while working on it, this snippet of code uses htmltools package to do so:
The htmlreg
function has many options to adjust the table formatting.
Below, I clean up the table.
I remove stars using
stars = NULL
. It is a growing convention to avoid the use of stars indicating significance in regression tables (see AJPS and Political Analysis guidelines).The arguments
doctype
,html.tag
,head.tag
,body.tag
control what sort of HTML is created. Generally all these functions (whether LaTeX or HTML output) have some arguments that determine whether it is creating a standalone, complete document, or a fragment that will be copied into another document.The arguments
include.rsquared
,include.adjrs
, andinclude.nobs
are passed to the functionextract()
which determines what information thetexreg
package extracts from a model to put into the table. I get rid of \(R^2\), but keep adjusted \(R^2\), and the number of observations.
library("stringr")
coefnames <- c("Professional",
"Working Class",
"Income",
"Education")
note <- "OLS regressions with prestige as the response variable."
htmlreg(prestige_mods, stars = NULL,
custom.model.names = str_c("(", seq_along(prestige_mods), ")"),
omit.coef = "\\(Intercept\\)",
custom.coef.names = coefnames,
custom.note = str_c("Note: ", note),
caption.above = TRUE,
caption = "Regressions of Occupational Prestige",
# better for markdown
doctype = FALSE,
html.tag = FALSE,
head.tag = FALSE,
body.tag = FALSE,
# passed to extract() method for "lm"
include.adjr = TRUE,
include.rsquared = FALSE,
include.rmse = FALSE,
include.nobs = TRUE)
(1) | (2) | (3) | (4) | ||
---|---|---|---|---|---|
Professional | 57.68 | 16.66 | |||
(5.10) | (6.99) | ||||
Working Class | 13.90 | -14.66 | |||
(7.35) | (6.11) | ||||
Income | 1.08 | 0.60 | |||
(0.11) | (0.09) | ||||
Education | 0.90 | 0.35 | |||
(0.08) | (0.11) | ||||
Adj. R2 | 0.75 | 0.69 | 0.72 | 0.90 | |
Num. obs. | 45 | 45 | 45 | 45 | |
Note: OLS regressions with prestige as the response variable. |
Once you find a set of options that are common across your tables, make a function so you do not need to retype them.
my_reg_table <- function(mods, ..., note = NULL) {
htmlreg(mods,
stars = NULL,
custom.note = if (!is.null(note)) str_c("Note: ", note) else NULL,
caption.above = TRUE,
# better for markdown
doctype = FALSE,
html.tag = FALSE,
head.tag = FALSE)
}
my_reg_table(prestige_mods,
custom.model.names = str_c("(", seq_along(prestige_mods), ")"),
custom.coef.names = coefnames,
note = note,
# put intercept at the bottom
reorder.coef = c(2, 3, 4, 5, 1),
caption = "Regressions of Occupational Prestige")
Model 1 | Model 2 | Model 3 | Model 4 | ||
---|---|---|---|---|---|
(Intercept) | 22.76 | 2.46 | 0.28 | -0.19 | |
(3.47) | (5.19) | (5.09) | (3.71) | ||
typeprof | 57.68 | 16.66 | |||
(5.10) | (6.99) | ||||
typewc | 13.90 | -14.66 | |||
(7.35) | (6.11) | ||||
income | 1.08 | 0.60 | |||
(0.11) | (0.09) | ||||
education | 0.90 | 0.35 | |||
(0.08) | (0.11) | ||||
R2 | 0.76 | 0.70 | 0.73 | 0.91 | |
Adj. R2 | 0.75 | 0.69 | 0.72 | 0.90 | |
Num. obs. | 45 | 45 | 45 | 45 | |
RMSE | 15.88 | 17.40 | 16.69 | 9.74 | |
Note: OLS regressions with prestige as the response variable. |
Note that I didn’t include every option in my_reg_table
, only those arguments that will be common across tables.
I use ...
to pass arguments to htmlreg
.
Then when I call my_reg_table
the only arguments are those specific to the
content of the table, not the formatting, making it easier to understand what each table is saying.
Of course, texreg
also produces LaTeX output, with the function texreg.
Almost all the options are the same as htmlreg
.