Research Themes

Dissertation Research Overview

The main theme of my dissertation work is enabling explainable low-bias machine learning from histopathology images with concept bottleneck models. The term "explainability" refers to the amenability of machine learning models to a semi-intuitive understanding of why they make the decisions they do. In the context of healthcare, and histopathology in particular, this task is not straightforward given the lack of large-scale datasets for model validation, and the complex logistic and regulatory environment surrounding data acquisition, standardization and sharing. Below is a thematic summary of some of my doctoral dissertation work.

Crowdsourcing for scalable curation of histology datasets

Training accurate artificial intelligence models requires an enormous amount of labeled data. Generating a meaningful number of annotations requires engaging with multiple experts, and even experienced pathologists will exhibit some inter-rater discordance. In phase 1, "structured" crowdsourcing and data management approaches were used to collaborate with over 30 medical students, pathology residents and attendings to create over 20,000+ annotations of histologic region boundaries. In phase 2, we used similar procedures to generate 200,000+ annotations of nucleus locations and boundaries. See datasets for details. These annotations are used to train convolutional neural networks for tissue region delineation (semantic segmentation) and nucleus localization, classification and segmentation. Some of the results have been published in Bioinformatics. This talk and this talk provide a nice summary of this theme.


This work was partly done in collaboration with Roche Tissue Diagnostics (Ventana Medical Systems).
This was a thoughtful gift to celebrate the publication of our paper.

Computational assessment of Tumor Infiltrating Lymphocytes

TILs is an important emerging diagnostic and predictive biomarker in breast cancer and other solid tumors. Some of my work involves the development of tools and algorithms to reliably assess TILs computationally, in order to reduce inter-observer variability and correlate complex computationally-driven TILs spatial patterns with patient outcomes. This work was published in Nature npj Breast Cancer, SPIE Medical Imaging (Digital Pathology), and won the best poster award at the Association for Pathology Informatics 2021 Summit.


This work was partly done in collaboration with the International Immuno-Oncology Working group (TILs working group) and Roche Tissue Diagnostics (Ventana Medical Systems).

Integrative machine learning from histology and genomics

Scanned whole-slide images (WSIs) contain a very diverse array of visual features, which contribute to observed patient-level features, including clinical outcome (eg. survival or treatment response) and gene expression, in complex ways. WSIs from a single patient typically contain 105-106 individual cells, including cancer, normal cells, and immune response elements. These elements come together to form tissue regions that may be cohesive or sparse in nature. My goal includes combining this data using high-fidelity machine learning models, potentially providing insights into morphological and genomic correlates of patient outcomes and response to therapy. This work is complementary to work we published in PNAS and Scientific Reports.


This is a work-in-progress in collaboration with the American Cancer Society.