Tidy Data: Candy

Setup

Loading Libraries.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

Importing

Importing Candy data.

candy_raw <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRCGayKLOy-52gKmEoPOj3ZKnOQVtCiooSloiCr-i_ci27e4n1CMPL0Z9s6MeFX9oQuN9E-HCFJnWjD/pub?gid=1456715839&single=true&output=csv") |> clean_names()

Rows: 175 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Timestamp, First name, Last name, Candy type, Box code
dbl (6): Red, Green, Orange, Yellow, Blue, Brown

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

candy_raw

Removing Columns

Removing Timestamp, Candy_Type, and Box_Code columns.

candy <- candy_raw |>
  select(!c(timestamp, candy_type, box_code))

candy |> head()

Pivoting Longer

Pivoting candy data.

candy_long <- candy |> 
  pivot_longer(
    cols = red:brown,
    names_to = "color",
    values_to = "count_candies"
  )

candy_long |> head()

Average Candies by Color

Calculating average candies by color.

candy_avg <- candy_long |> 
  group_by(color) |> 
  summarize(avg_candies = mean(count_candies, na.rm = TRUE)) |> 
  mutate(avg_candies = round(avg_candies, 1))

candy_avg

Plotting

Plotting candy data.

candy_plot <- ggplot( 
    candy_avg, 
    aes(x = reorder(color, avg_candies), y = avg_candies )) +
  geom_col() + 
  coord_flip() + 
  geom_text(aes(label = avg_candies),   hjust = 1.5, color = "white") +
  labs( 
    title = "Sweet Stats", 
    subtitle = str_wrap("Students in Reporting with Data each recorded the count of how many specific colors were in their M&M's package. Data is taken from a long-term dataset."), 
    caption = "By Ryland Russell", 
    x = "", 
    y = "Average Count") + 
    theme_bw()

candy_plot

Exporting

Exporting candy_avg.

candy_avg |> write_csv("data-processed/candy_avg.csv")

Datawrapper Graphic

Link to datawrapper graphic

Pivoting Wider

Pivoting candy data.

candy_box_color <- candy_raw |>
  filter(box_code != "130FXWAC0336", box_code != "1524976SE") |>
  select(!c(timestamp, candy_type)) |>
  pivot_longer(cols = red:brown, names_to = "candy_color", values_to = "candy_count") |>
  group_by(box_code, candy_color) |>
  summarise(avg_candies = mean(candy_count) |> round(1))

`summarise()` has grouped output by 'box_code'. You can override using the
`.groups` argument.

candy_box_color

# pivoting new data 

candy_box_color |> 
  pivot_wider(names_from = box_code, values_from = avg_candies)

Pivoting on my own

Pivoting wider.

candy_box_color |>
  pivot_wider(names_from = candy_color, values_from = avg_candies)