Research Themes
What does the logo symbolize?
The pink downward sloping lines are Kaplan-Meier survival curves. If you aren't familiar with those, the higher the curve, the longer the patient survival. The gear and microscope represent the use of computational tools in diagnostic pathology. So overall, the logo symbolizes a vision for computational pathology to improve patient survival outcomes.
P.S. if you squint, it looks like a Pac-Man, eating a smaller Pac-Man, who's munching on the microscope :)
Dissertation Research Overview
My PhD dissertation is titled "Computational discovery of interpretable histopathologic prognostic biomarkers in invasive carcinomas of the breast ". In particular, I develop explainable low-bias machine learning with concept bottleneck models. In the context of healthcare, and histopathology in particular, this task is not straightforward given the lack of large-scale datasets for model validation, and the complex logistic and regulatory environment surrounding data acquisition, standardization and sharing. Below is a thematic summary of some of my doctoral dissertation work.
Crowdsourcing for scalable curation of histology datasets
Training accurate artificial intelligence models requires an enormous amount of labeled data. Generating a meaningful number of annotations requires engaging with multiple experts, and even experienced pathologists will exhibit some inter-rater discordance. In phase 1, "structured" crowdsourcing and data management approaches were used to collaborate with over 30 medical students, pathology residents and attendings to create over 20,000+ annotations of histologic region boundaries. In phase 2, we used similar procedures to generate 200,000+ annotations of nucleus locations and boundaries. See datasets for details. These annotations are used to train convolutional neural networks for tissue region delineation (semantic segmentation) and nucleus localization, classification and segmentation. Some of the results have been published in Bioinformatics. This talk and this talk provide a nice summary of this theme.
This work was partly done in collaboration with Roche Tissue Diagnostics (Ventana Medical Systems).
Computational assessment of Tumor Infiltrating Lymphocytes
TILs is an important emerging diagnostic and predictive biomarker in breast cancer and other solid tumors. Some of my work involves the development of tools and algorithms to reliably assess TILs computationally, in order to reduce inter-observer variability and correlate complex computationally-driven TILs spatial patterns with patient outcomes. This work was published in Nature npj Breast Cancer, SPIE Medical Imaging (Digital Pathology), and won the best poster award at the Association for Pathology Informatics 2021 Summit.
This work was partly done in collaboration with the International Immuno-Oncology Working group (TILs working group) and Roche Tissue Diagnostics (Ventana Medical Systems).
Integrative machine learning from histology and genomics
Scanned whole-slide images (WSIs) contain a very diverse array of visual features, which contribute to observed patient-level features, including clinical outcome (eg. survival or treatment response) and gene expression, in complex ways. WSIs from a single patient typically contain ~106 cells. These come together to form tissue regions that may be cohesive or sparse. My goal includes analyzing this data using machine learning to gain insights insights into morphological and genomic correlates of patient outcomes. This work is complementary to work we published in PNAS and Scientific Reports.
This is a work-in-progress in collaboration with the American Cancer Society.