Research Themes

Dissertation Research Overview

The main theme of my dissertation work is enabling Interpretable Machine Learning from Gigapixel Histopathology Images. The term "interpretable" refers to the amenability of machine learning models to explanation and a semi-intuitive understanding of why they make the decisions they do. In the context of healthcare, and histopathology in particular, this task is not straightforward given the lack of large-scale datasets for model validation, and the complex logistic and regulatory environment surrounding data acquisition, standardization and sharing. Below is a thematic summary of some of my doctoral dissertation work.

This was a thoughtful gift to celebrate the publication of our paper.

Crowdsourcing for scalable curation of histology datasets

Training accurate artificial intelligence models requires an enormous amount of labeled data. Generating a meaningful number of annotations requires engaging with multiple experts, and even experienced pathologists will exhibit some inter-rater discordance. In phase 1, "structured" crowdsourcing and data management approaches were used to collaborate with over 30 medical students, pathology residents and attendings to create over 50,000 annotations of histologic region boundaries (which can be downloaded here). In phase 2, we used similar procedures to generate 100,000+ annotations of nucleus locations and boundaries. These annotations are used to train convolutional neural networks for tissue region delineation (semantic segmentation) and nucleus localization, classification and segmentation. Some of the results have been published in Bioinformatics. The rest is work-in-progress. This talk and this presentation provide a nice summary of this theme.


This work was partly done in collaboration with Roche Tissue Diagnostics (Ventana Medical Systems).

Computational assessment of Tumor Infiltrating Lymphocytes (TILs)

TILs is an important emerging diagnostic and predictive biomarker in breast cancer and other solid tumors. Some of my work involves the development of tools and algorithms to reliably assess TILs computationally, in order to reduce inter-obersrver variability and correlate complex computationally-driven TILs spatial patterns with patient outcomes. This work was published in Nature npj Breast Cancer and SPIE Medical Imaging (Digital Pathology). The rest is work-in-progress.


This work was partly done in collaboration with the International Immuno-Oncology Working group (TILs working group) and Roche Tissue Diagnostics (Ventana Medical Systems).

Integrative machine learning from histology and genomics

Scanned whole-slide images (WSIs) contain a very diverse array of visual features, which contribute to observed patient-level features, including clinical outcome (eg. survival or treatment response) and gene expression, in complex ways. WSIs from a single patient typically contain 105-106 individual cells, including cancer, normal cells, and immune response elements. These elements come together to form tissue regions that may be cohesive or sparse in nature. My goal includes combining this data using high-fidelity machine learning models, potentially providing insights into morphological and genomic correlates of patient outcomes and response to therapy. This work is complementary to work we published in PNAS and Scientific Reports.


This is a work-in-progress in collaboration with the American Cancer Society.