I need your help!

If you find any typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.

25 Many models

25.2 gapminder

Exercise 25.2.1

A linear trend seems to be slightly too simple for the overall trend. Can you do better with a quadratic polynomial? How can you interpret the coefficients of the quadratic? Hint you might want to transform year so that it has mean zero.)

The following code replicates the analysis in the chapter but replaces the function country_model() with a regression that includes the year squared.

Exercise 25.2.2

Explore other methods for visualizing the distribution of \(R^2\) per continent. You might want to try the ggbeeswarm package, which provides similar methods for avoiding overlaps as jitter, but uses deterministic methods.

Exercise 25.2.3

To create the last plot (showing the data for the countries with the worst model fits), we needed two steps: we created a data frame with one row per country and then semi-joined it to the original dataset. It’s possible to avoid this join if we use unnest() instead of unnest(.drop = TRUE). How?

25.3 List-columns

No exercises

25.4 Creating list-columns

Exercise 25.4.2

Brainstorm useful summary functions that, like quantile(), return multiple values.

Exercise 25.4.3

The particular quantiles of the values are missing, e.g. 0%, 25%, 50%, 75%, 100%. quantile() returns these in the names of the vector.

Since the unnest function drops the names of the vector, they aren’t useful here.

Exercise 25.4.4

What does this code do? Why might might it be useful?

It creates a data frame in which each row corresponds to a value of cyl, and each observation for each column (other than cyl) is a vector of all the values of that column for that value of cyl. It seems like it should be useful to have all the observations of each variable for each group, but off the top of my head, I can’t think of a specific use for this. But, it seems that it may do many things that dplyr::do does.

25.5 Simplifying list-columns

Exercise 25.5.1

Why might the lengths() function be useful for creating atomic vector columns from list-columns?

The lengths() function returns the lengths of each element in a list. It could be useful for testing whether all elements in a list-column are the same length. You could get the maximum length to determine how many atomic vector columns to create. It is also a replacement for something like map_int(x, length) or sapply(x, length).

Exercise 25.5.2

List the most common types of vector found in a data frame. What makes lists different?

The common types of vectors in data frames are:

  • logical
  • numeric
  • integer
  • character
  • factor

All of the common types of vectors in data frames are atomic. Lists are not atomic since they can contain other lists and other vectors.

25.6 Making tidy data with broom

No exercises