GitHub Projects

Home | Academic and Professional Experiences | GitHub Projects

Project 1: Glioblastoma Multiforme (GBM) Market Analysis (repository link)

This project examines the structure of Glioblastoma Multiforme (GBM) market and existing treatment gaps using information collected from GBM patients. Data provided by Michael Allen Company. For this project, I utilized R version 4.3.1, and R packages dplyr, summarytools, forcats, tidyverse, psych, broom, knitr, and ggplot2.

You can access the project report Here.

Project 2: NYC Zip Code Level Population Changes (repository link)

This project examines NYC ZIP code-level population changes using USPS Change of Address (COA) data. You can access the project report HERE.

Project 3: Visualization and EDA (repository link)

This project is broken down to three different smaller projects. It uses data “The Instacart Online Grocery Shopping Dataset 2017” or instacart, “Behavioral Risk Factors Surveillance System for Selected Metropolitan Area Risk Trends (SMART) for 2002-2010” or smart, and accelerometer data collected on 250 participants in the NHANES study or accel.

For more detailed description of each project, please visit:

instacart project REPORT
smart project REPORT
accel project REPORT

Project 4: Data Cleaning SOP (repository link)

This project is broken down to three different smaller projects. It uses data “FiveThirtyEight” or 538, “Mr. Trash Wheel” or trashwheel, and dataset collected in an observational study to understand the trajectory of Alzheimer’s disease (AD) biomarkers, or amyloid.

For more detailed description of each project, please visit:

538 project REPORT
trashwheel project REPORT
amyloid project REPORT

Project 5: Flexdashboard

For this project, I created a flexdashboard using a random sampling of 500 observations of the instacart dataset. Click HERE to view.

Project 6: Investigating the Association Between Depression and Hypertension (repository link)

This is a group project for Application of Epidemiological Research Methods class. Using NHANES 2017 - 2018 data, we analyzed the crude association between binary exposure of Major Depressive Disorder and binary outcome of Hypertension with bivariate logistic regression and Pearson chi-square test. We also included potential confounding variables to assess the adjusted association with multivariable logistic regression and Mantel-Haenszel chi-square test. We calculated the odds ratio and their 95% confidence interval for both crude and adjusted association.

I was responsible for the research question conceptualization and formalization, conducting literature review, SAS programming for data cleaning, analysis, and error-checking throughout the coding and analyzing process.

You may access the final project abstract HERE and the SAS coding pipeline HERE.