Go Back

Project Overview

This project utilized data “The Instacart Online Grocery Shopping Dataset 2017” or the instacart dataset from package p8105.datasets to explore the trends of offerings and purchases in online grocery shopping. The project utilized R version 4.3.1 and R packages dplyr, ggplot2, tidyverse, and p8105.datasets.

The instacart dataset has 1384617 observations and 15 variables, including:

  • order_id: order identifier, class integer.
  • product_id: product identifier, class integer.
  • add_to_cart_order: order in which each product was added to cart, class integer.
  • reordered: 1 if this prodcut has been ordered by this user in the past, 0 otherwise, class integer.
  • user_id: customer identifier, class integer.
  • eval_set: which evaluation set this order belongs in (Note that the data for use in this class is exclusively from the “train” eval_set), class character.
  • order_number: the order sequence number for this user (1=first, n=nth), class integer.
  • order_dow: the day of the week on which the order was placed, class integer.
  • order_hour_of_day: the hour of the day on which the order was placed, class integer.
  • days_since_prior_order: days since the last order, capped at 30, NA if order_number=1, class integer.
  • product_name: name of the product, class character.
  • aisle_id: aisle identifier, class integer.
  • department_id: department identifier, class integer.
  • aisle: the name of the aisle, class character.
  • department: the name of the department, class character.

Visualizations and EDA

Table for Number of Orders, by Day of Week

order_dow n_obs
0 324026
1 205978
2 160562
3 154381
4 155481
5 176910
6 207279

There are 131209 unique users and 131209 unique orders. On average, there are 197802.43 items sold per day. There are 134 aisles and 21 departments. There are 134 aisles, and most items are ordered from aisle fresh vegetables.

Department Barplot

Aisles Barplot, colored by departments

Number of items ordered in each aisle, for aisles with more than 10000 items ordered.

There are 39 aisles with more than 10,000 items ordered. Most aisles have less than 40,000 items ordered. There are 5 aisles that have more than 40,000 items sold. They are: fresh fruits, fresh vegetables, packaged cheese, packaged vegetables fruits, yogurt.

Mean hour of day at which Pink Lady Apples and Coffee Ice Cream are ordered on each day of the week

Sunday Monday Tuesday Wednesday Thursday Friday Saturday
13.6 12.17391 12.83824 14.68519 13.17308 12.64286 13.25

Wednesday has the latest mean hour of day at which Pink Lady Apples and Coffee Ice Cream are ordered.