I need your help!

If you find any typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.

16 Dates and times

16.3 Date-time components

The following code from the chapter is used

In the previous code, the difference between rounded and un-rounded dates provides the within-period time.

Exercise 16.3.1

How does the distribution of flight times within a day change over the course of the year?

Let’s try plotting this by month:

This will look better if everything is normalized within groups. The reason that February is lower is that there are fewer days and thus fewer flights.

At least to me there doesn’t appear to much difference in within-day distribution over the year, but I maybe thinking about it incorrectly.

Exercise 16.3.2

Compare dep_time, sched_dep_time and dep_delay. Are they consistent? Explain your findings.

If they are consistent, then dep_time = sched_dep_time + dep_delay.

There exist discrepancies. It looks like there are mistakes in the dates. These are flights in which the actual departure time is on the next day relative to the scheduled departure time. We forgot to account for this when creating the date-times using make_datetime_100() function in 16.2.2 From individual components. The code would have had to check if the departure time is less than the scheduled departure time plus departure delay (in minutes). Alternatively, simply adding the departure delay to the scheduled departure time is a more robust way to construct the departure time because it will automatically account for crossing into the next day.

Exercise 16.3.4

How does the average delay time change over the course of a day? Should you use dep_time or sched_dep_time? Why?

Exercise 16.3.6

What makes the distribution of diamonds$carat and flights$sched_dep_time similar?

In both carat and sched_dep_time there are abnormally large numbers of values are at nice “human” numbers. In sched_dep_time it is at 00 and 30 minutes. In carats, it is at 0, 1/3, 1/2, 2/3,

In scheduled departure times it is 00 and 30 minutes, and minutes ending in 0 and 5.

Exercise 16.3.7

Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. Hint: create a binary variable that tells you whether or not a flight was delayed.

First, I create a binary variable early that is equal to 1 if a flight leaves early, and 0 if it does not. Then, I group flights by the minute of departure. This shows that the proportion of flights that are early departures is highest between minutes 20–30 and 50–60.

16.4 Time spans

Exercise 16.4.1

Why is there months() but no dmonths()?

There is no unambiguous value of months in terms of seconds since months have differing numbers of days.

  • 31 days: January, March, May, July, August, October, December
  • 30 days: April, June, September, November
  • 28 or 29 days: February

The month is not a duration of time defined independently of when it occurs, but a special interval between two dates.

Exercise 16.4.2

Explain days(overnight * 1) to someone who has just started learning R. How does it work?

The variable overnight is equal to TRUE or FALSE. If it is an overnight flight, this becomes 1 day, and if not, then overnight = 0, and no days are added to the date.

Exercise 16.4.3

Create a vector of dates giving the first day of every month in 2015. Create a vector of dates giving the first day of every month in the current year.

A vector of the first day of the month for every month in 2015:

To get the vector of the first day of the month for this year, we first need to figure out what this year is, and get January 1st of it. I can do that by taking today() and truncating it to the year using floor_date():

Exercise 16.4.4

Write a function that given your birthday (as a date), returns how old you are in years.

Exercise 16.4.5

Why can’t (today() %--% (today() + years(1)) / months(1) work?

The code in the question is missing a parentheses. So, I will assume that that the correct code is,

While this code will not display a warning or message, it does not work exactly as expected. The problem is discussed in the Intervals section.

The numerator of the expression, (today() %--% (today() + years(1)), is an interval, which includes both a duration of time and a starting point. The interval has an exact number of seconds. The denominator of the expression, months(1), is a period, which is meaningful to humans but not defined in terms of an exact number of seconds. Months can be 28, 29, 30, or 31 days, so it is not clear what months(1) divide by? The code does not produce a warning message, but it will not always produce the correct result.

To find the number of months within an interval use %/% instead of /,

Alternatively, we could define a “month” as 30 days, and run

This approach will not work with today() + years(1), which is not defined for February 29th on leap years:

16.5 Time zones

No exercises