Home | Academic and Professional Experiences | GitHub Projects

mmpR5 Drug Resistance Pipeline (repository)

An end-to-end bioinformatics pipeline for Mycobacterium tuberculosis mmpR5 (Rv0678) variant discovery, structural annotation, and bedaquiline (BDQ) resistance classification. The pipeline is organized as 6 modular Jupyter notebooks orchestrated by a controller notebook, and supports both standalone execution and automation via Papermill. Modules cover SRA data acquisition, fastp QC, whole-genome assembly and alignment, variant extraction, structural annotation, and resistance classification.

Languages/tools: Python, Jupyter Notebook, Google Colab, ColabFold

MDR-TB/HIV Treatment Adherence — NLP & Predictive Modeling (repository)

This project applies natural language processing and machine learning to de-identified monthly counseling session notes from the PRAXIS study to identify latent barriers and protective factors of bedaquiline (BDQ) and antiretroviral therapy (ART) adherence among people with MDR-TB and HIV co-infection. The pipeline uses SBERT sentence embeddings, semantic similarity search, and K-means clustering to derive thematic groups, followed by predictive modeling to assess their relationship with adherence outcomes. A first-author manuscript based on this work is under review at PLOS One.

Languages/tools: Python 3.12 (sentence-transformers, scikit-learn, pandas, statsmodels)

Multi-omic Systemic Host-Response Score (MoSS) (repository)

Analysis code for the derivation, validation, and external projection of the Multi-omic Systemic Host-Response Score (MoSS) — a multi-omic latent factor derived from integrated RNA-seq, proteomics, and immune cell composition data that captures a coordinated host-response axis of severe infection and related critical illness. The repository includes self-contained scripts for external MoSS scoring, enabling projection of the score onto new cohorts. Manuscript under review (2026).

Languages/tools: R

Citation: Cummings MJ*, Lu X, et al. A multi-omic host-response axis for stratification of severe infection and critical illness. Under review, 2026.