Go Back

Overview

This project is part of the bigger project of which I thoroughly walked through steps in data cleaning & wrangling process. I then did some descriptive explorations of the dataset. In this project, I used the data from Mr. Trash Wheel. This project utilized R version 4.3.1, and R packages tidyverse and dplyr.

library(tidyverse)
library(dplyr)

STEP 1: read & clean Mr. Trash Wheel sheet

Directions:

  • specify the sheet in the Excel file and to omit non-data entries (rows with notes / figures; columns containing notes) using arguments in read_excel
  • use reasonable variable names
  • omit rows that do not include dumpster-specific data
  • The data include a column for the (approximate) number of homes powered. This calculation is described in the Homes powered note, but not applied to every row in the dataset. Update the data to include a new homes_powered variable based on this calculation.
mr_trash_wheel <- 
  readxl::read_excel("202309\ Trash\ Wheel\ Collection\ Data.xlsx", 
                     sheet = "Mr. Trash Wheel", col_names = TRUE, range = "A2:N586") %>%
  janitor::clean_names() %>%
  mutate("day" = format(date, format="%d")) %>%
  mutate("year" = format(year, format="%Y")) %>%
  select(dumpster, year, month, day, everything(), -date) %>%
  mutate("homes_powered" = (weight_tons*500)/30) %>%
  mutate("dumpster" = paste("mr", dumpster, sep="_"))
head(mr_trash_wheel)
## # A tibble: 6 × 14
##   dumpster year  month day   weight_tons volume_cubic_yards plastic_bottles
##   <chr>    <chr> <chr> <chr>       <dbl>              <dbl>           <dbl>
## 1 mr_1     2014  May   16           4.31                 18            1450
## 2 mr_2     2014  May   16           2.74                 13            1120
## 3 mr_3     2014  May   16           3.45                 15            2450
## 4 mr_4     2014  May   17           3.1                  15            2380
## 5 mr_5     2014  May   17           4.06                 18             980
## 6 mr_6     2014  May   20           2.71                 13            1430
## # ℹ 7 more variables: polystyrene <dbl>, cigarette_butts <dbl>,
## #   glass_bottles <dbl>, plastic_bags <dbl>, wrappers <dbl>,
## #   sports_balls <dbl>, homes_powered <dbl>

I changed the date variable in this sheet to day as there are year and month variables. I also changed the format of year to character. The Mr. Trash Wheel sheet contains 14 variables and 584 observations. Variables in this sheet include: dumpster, year, month, day, weight_tons, volume_cubic_yards, plastic_bottles, polystyrene, cigarette_butts, glass_bottles, plastic_bags, wrappers, sports_balls, homes_powered. Variable dumpster is the primary key that contains values unique to each observation.

STEP 2: read & clean Professor Trash Wheel sheet

professor_trash_wheel <- 
  readxl::read_excel("202309\ Trash\ Wheel\ Collection\ Data.xlsx", 
                     sheet = "Professor Trash Wheel", col_names = TRUE, range = "A2:M108") %>%
  janitor::clean_names() %>%
  mutate("day" = format(date, format="%d")) %>%
  mutate("year" = format(year, format="%Y")) %>%
  select(dumpster, year, month, day, everything(), -date) %>%
  mutate("homes_powered" = (weight_tons*500)/30) %>%
  mutate("dumpster" = paste("professor", dumpster, sep="_"))
head(professor_trash_wheel)
## # A tibble: 6 × 13
##   dumpster    year  month   day   weight_tons volume_cubic_yards plastic_bottles
##   <chr>       <chr> <chr>   <chr>       <dbl>              <dbl>           <dbl>
## 1 professor_1 2017  January 02           1.79                 15            1950
## 2 professor_2 2017  January 30           1.58                 15            9540
## 3 professor_3 2017  Februa… 26           2.32                 18            8350
## 4 professor_4 2017  Februa… 26           3.72                 15            8590
## 5 professor_5 2017  Februa… 28           1.45                 15            7830
## 6 professor_6 2017  March   30           1.71                 15            8210
## # ℹ 6 more variables: polystyrene <dbl>, cigarette_butts <dbl>,
## #   glass_bottles <dbl>, plastic_bags <dbl>, wrappers <dbl>,
## #   homes_powered <dbl>

I changed the date variable in this sheet to day as there are year and month variables. I also changed the format of year to character. The Professor Trash Wheel sheet contains 13 variables and 106 observations. Variables in this sheet include: dumpster, year, month, day, weight_tons, volume_cubic_yards, plastic_bottles, polystyrene, cigarette_butts, glass_bottles, plastic_bags, wrappers, homes_powered. Variable dumpster is the primary key that contains values unique to each observation.

STEP 3: read & clean Gwynnda Trash Wheel sheet

gwynnda_trash_wheel <- 
  readxl::read_excel("202309\ Trash\ Wheel\ Collection\ Data.xlsx", 
                     sheet = "Gwynnda Trash Wheel", col_names = TRUE, range = "A2:L157") %>%
  janitor::clean_names() %>%
  mutate("day" = format(date, format="%d")) %>%
  mutate("year" = format(year, format="%Y")) %>%
  select(dumpster, year, month, day, everything(), -date) %>%
  mutate("homes_powered" = (weight_tons*500)/30) %>%
  mutate("dumpster" = paste("gwynnda",  dumpster, sep="_"))
head(gwynnda_trash_wheel)
## # A tibble: 6 × 12
##   dumpster  year  month  day   weight_tons volume_cubic_yards plastic_bottles
##   <chr>     <chr> <chr>  <chr>       <dbl>              <dbl>           <dbl>
## 1 gwynnda_1 2021  July   03           0.93                 15            1200
## 2 gwynnda_2 2021  July   07           2.26                 15            2000
## 3 gwynnda_3 2021  July   07           1.62                 15            1800
## 4 gwynnda_4 2021  July   16           1.76                 15            1000
## 5 gwynnda_5 2021  July   30           1.53                 15            2100
## 6 gwynnda_6 2021  August 11           2.06                 15            2400
## # ℹ 5 more variables: polystyrene <dbl>, cigarette_butts <dbl>,
## #   plastic_bags <dbl>, wrappers <dbl>, homes_powered <dbl>

I changed the date variable in this sheet to day as there are year and month variables. I also changed the format of year to character. The Professor Trash Wheel sheet contains 12 variables and 155 observations. Variables in this sheet include: dumpster, year, month, day, weight_tons, volume_cubic_yards, plastic_bottles, polystyrene, cigarette_butts, plastic_bags, wrappers, homes_powered. Variable dumpster is the primary key that contains values unique to each observation.

STEP 4: merge the three sheets

data_q2 <-
  full_join(mr_trash_wheel, professor_trash_wheel) %>%
  full_join(., gwynnda_trash_wheel)
summary(data_q2)
##    dumpster             year              month               day           
##  Length:845         Length:845         Length:845         Length:845        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   weight_tons    volume_cubic_yards plastic_bottles  polystyrene   
##  Min.   :0.610   Min.   : 5.00      Min.   :   0    Min.   :    0  
##  1st Qu.:2.490   1st Qu.:15.00      1st Qu.:1000    1st Qu.:  280  
##  Median :3.070   Median :15.00      Median :1980    Median :  950  
##  Mean   :3.009   Mean   :15.13      Mean   :2296    Mean   : 1631  
##  3rd Qu.:3.540   3rd Qu.:15.00      3rd Qu.:2900    3rd Qu.: 2400  
##  Max.   :5.620   Max.   :20.00      Max.   :9830    Max.   :11528  
##                                     NA's   :1       NA's   :1      
##  cigarette_butts  glass_bottles     plastic_bags      wrappers    
##  Min.   :     0   Min.   :  0.00   Min.   :    0   Min.   :  180  
##  1st Qu.:  3200   1st Qu.: 10.00   1st Qu.:  280   1st Qu.:  840  
##  Median :  5500   Median : 18.00   Median :  680   Median : 1380  
##  Mean   : 15592   Mean   : 20.89   Mean   : 1082   Mean   : 2330  
##  3rd Qu.: 16000   3rd Qu.: 28.00   3rd Qu.: 1400   3rd Qu.: 2635  
##  Max.   :310000   Max.   :110.00   Max.   :13450   Max.   :20100  
##  NA's   :1        NA's   :156      NA's   :1       NA's   :118    
##   sports_balls   homes_powered  
##  Min.   : 0.00   Min.   :10.17  
##  1st Qu.: 6.00   1st Qu.:41.50  
##  Median :11.00   Median :51.17  
##  Mean   :13.17   Mean   :50.16  
##  3rd Qu.:18.25   3rd Qu.:59.00  
##  Max.   :56.00   Max.   :93.67  
##  NA's   :261

Note that I used the full_join function here. This way, all data entries from all tables will be included into the final merged dataset. The merged data contains 14 variables and 845 observations. Variables in this sheet include: dumpster, year, month, day, weight_tons, volume_cubic_yards, plastic_bottles, polystyrene, cigarette_butts, glass_bottles, plastic_bags, wrappers, sports_balls, homes_powered. Variable dumpster is the primary key that contains values unique to each observation.

The total weight of trash collected by Professor Trash Wheel is 216.26. The total number of cigarette butts collected by Gwynnda in July of 2021 is 16300.

STEP 5: Save cleaned file

We can save our cleaned dataset as follows:

write.csv(data_q2, file = "data_q2.csv")