Analysis

Goals of this notebook

Exploring the Austin Camp Mabry weather data.

  • Which days had the most rain, and how much rain are we talking about?
  • What are the hottest and coldest days in history?
  • Which years had the most days of 100+ temperature?
  • Which years had the most total rainfall and how much? Which had the least?
  • Which years had the most total snowfall and how much?
  • What is the average rainfall for each month across time?

Setup

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(dplyr)

Importing Clean Data

weather <- read_rds("data-processed/01-weather.rds")

weather |> glimpse()
Rows: 31,296
Columns: 8
$ station <chr> "USW00013958", "USW00013958", "USW00013958", "USW00013958", "U…
$ name    <chr> "AUSTIN CAMP MABRY, TX US", "AUSTIN CAMP MABRY, TX US", "AUSTI…
$ date    <date> 1938-06-01, 1938-06-02, 1938-06-03, 1938-06-04, 1938-06-05, 1…
$ prcp    <dbl> 0.00, 0.00, 0.00, 0.40, 0.02, 0.00, 0.00, 0.00, 1.60, 0.01, 0.…
$ snow    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ snwd    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ tmax    <dbl> 91, 94, 94, 90, 94, 92, 95, 92, 87, 90, 92, 91, 91, 91, 89, 89…
$ tmin    <dbl> 72, 67, 70, 68, 68, 70, 70, 76, 64, 76, 75, 71, 70, 68, 71, 70…

Most Rain

Which days had the most rain, and how much rain are we talking about?

weather |>
  arrange(desc(prcp)) |>
  select(prcp, date) |>
  filter(prcp >= 5)

Data Takeaway: The days with the most rain were Nov. 15, 2001, Sep. 7, 2010 and Oct. 17, 1998. All three days averaged about 21 inches of rain. The day of Nov. 15, 2001 had the most at about 8 inches of rain.

Hottest Day

What are the hottest days in history?

weather |>
  arrange(desc(tmax)) |>
  select(date, tmax) |>
  filter(tmax >= 100) |>
  slice_head(n = 10)

Data Takeaway: The hottest days in history were Sep. 5, 2000 and Aug. 28, 2011 with a record temperature of 112°F.

Coldest Days

What are the coldest days in history?

weather |>
  arrange(tmin) |>
  select(date, tmin) |>
  filter(tmin <= 10)

Data Takeaway: The coldest day in history was Jan. 31, 1949 with a temperature of -2°F. The next coldest were Jan. 30, 1949 and Dec. 23, 1989 with low temperatures of 4°F.

Hottest Years

Which years had the most days of 100+ temperature?

hottest_years <- weather |>
  mutate(year = lubridate::year(date))|>
  filter(tmax >= 100) |>
  group_by(year) |>
  summarize(hottest_year = n()) |>
  arrange(desc(hottest_year)) |>
  head(5)

Data Takeaway: The year 2011 was the hottest year with over 90 days reaching temperatures above 100°F. The next (hottest to coldest) were 2023, 2009, 2022 and 2019.

hottest_year_plot <- ggplot(hottest_years, 
    aes(x = reorder(year, hottest_year), y = hottest_year )) +
  geom_col() + 
  coord_flip() + 
  geom_text(aes(label = hottest_year),   hjust = 1.5, color = "white") + 
  labs( 
    title = "Years With the Highest Number of Hottest Days", 
    subtitle = str_wrap("The year 2011 was the hottest year, with over 90 days soaring above 100°F. The next hottest years, with temperatures reaching above 100°F, were 2023, 2009, 2022 and 2019."), 
    caption = "By Ryland Russell", 
    x = "", 
    y = "Number of Days") + 
    theme_bw()

hottest_year_plot

Total Rainfall by Year

Which years had the most total rainfall and how much? Which had the least?

weather |>
  mutate(yr = year(date)) |>
  group_by(yr) |>
  summarize(rainfall = sum(prcp)) |>
  arrange(desc(rainfall)) |> 
  print() |> 
  filter(rainfall == max(rainfall))
# A tibble: 87 × 2
      yr rainfall
   <dbl>    <dbl>
 1  2015     60.0
 2  2004     52.3
 3  1991     52.2
 4  1957     51.3
 5  1946     47.3
 6  2007     47.0
 7  1997     46.8
 8  1992     46.0
 9  1981     45.7
10  1941     45.4
# ℹ 77 more rows
weather |>
  mutate(yr = year(date)) |>
  filter(yr != 2024) |>
  group_by(yr) |>
  summarize(rainfall = sum(prcp)) |>
  arrange(rainfall) |>
  print() |> 
  filter(rainfall == min(rainfall))
# A tibble: 86 × 2
      yr rainfall
   <dbl>    <dbl>
 1  1938     11.3
 2  1954     11.4
 3  1956     15.4
 4  2008     16.1
 5  1963     17.3
 6  1988     19.2
 7  2011     19.7
 8  1948     21.0
 9  2003     21.4
10  1947     21.6
# ℹ 76 more rows

Data Takeaway: 2015 had the most rainfall with about 60 inches of rain that year. 2004 and 1991 were the following years with the most rainfall with about 52 inches of rain per year. 1938 had the least rainfall with about 11 inches of rain that year. 1954 and 1956 were the following years with the least rainfall. 1954 received about 11 inches of rain and 1956, about 15 inches of rain.

Total Snowfall by Year

Which years had the most total snowfall and how much?

weather |>
  mutate(yr = year(date)) |>
  group_by(yr) |>
  summarize(snowfall = sum(snow)) |>
  arrange(desc(snowfall)) |> 
  print() |>
  filter(snowfall >= 2)
# A tibble: 87 × 2
      yr snowfall
   <dbl>    <dbl>
 1  1985      8.7
 2  2021      7.9
 3  1944      7  
 4  1949      6.5
 5  1966      6  
 6  1964      4.6
 7  1965      2.5
 8  1940      2  
 9  1967      2  
10  1980      2  
# ℹ 77 more rows

Data Takeaway: The year 1985 received the most snowfall with about 9 inches of snow. 2021 received about 8 inches of snow and 1944 received 7 inches of snow.

Average Rainfall by Month

What is the average rainfall for each month across time?

rainfall_by_month <- weather |>
  mutate(yr = year(date), month = month(date, label = TRUE))  |>
  group_by(yr, month) |> 
  summarize(total_rainfall = sum(prcp)) |> 
  group_by(month) |> 
  summarize(avg_rain = mean(total_rainfall))
`summarise()` has grouped output by 'yr'. You can override using the `.groups`
argument.
rainfall_by_month

Data Takeaway: Since 1938, May has had the highest average rainfall with over 5 inches of rain each year.

monthly_rainfall_chart <- ggplot(rainfall_by_month, 
    aes(x = month, y = avg_rain)) +
  geom_col() + 
  labs( 
    title = "Average Rain by Month Since 1938", 
    subtitle = str_wrap("Since 1938, May has had the highest average rainfall with over 5 inches of rain each year."), 
    caption = "By Ryland Russell", 
    x = "", 
    y = "Inches of Rain") + 
    theme_minimal()

monthly_rainfall_chart

Yearly Average

Calculating yearly average high and low temperatures.

average_temps <- weather |> 
  mutate(yr = year(date)) |>
  filter(yr != 1938 & yr != 2024) |>
  group_by(yr) |> 
  summarize(avg_high_temp = mean(tmax, na.rm = TRUE), avg_low_temp = mean(tmin, na.rm = TRUE ))

average_temps
average_temps_long <- average_temps |> 
   pivot_longer(cols = c(avg_high_temp, avg_low_temp),
               names_to = "temperature_type",
               values_to = "temperature")

average_temps_long
average_temps_line <- ggplot(average_temps_long, aes(x = yr, y = temperature, color = temperature_type)) +
  geom_line() +
  labs(x = "Year", y = "Temperature", title = "Average High and Low Temperatures Through the Years", subtitle = str_wrap("Average low and high temperatures have been consistently rising over the years, indicating a possible warming trend in the Austin area."), caption = "By Ryland Russell") +
  scale_color_manual(values = c("avg_high_temp" = "orange", "avg_low_temp" = "purple"), 
                     labels = c("avg_high_temp" = "Average High Temperature", 
                                "avg_low_temp" = "Average Low Temperature")) + theme(legend.title = element_blank()) 

  average_temps_line

Data Takeaway: Average low and high temperatures have been consistently rising over the years, indicating a possible warming trend in the Austin area.