Software & Datasets


HistomicsTK / Digital Slide Archive

I am an active contributor to the open-source software package HistomicsTK (see this talk), which is a python toolkit for organizing, annotating and analyzing WSI data. My contributions include the development of workflows to handle annotations and segmentation masks, color normalization and augmentation, and image processing workflows for efficient detection of tissue region boundaries. Additionally, I develop workflows that utilize the girder RESTful API to visualize and interact with data.

HistomicsTK is built and maintained by the company Kitware.


HistomicsML, where "ML" stands for machine learning, is a software tool developed by Michael Nalisnik, PhD and Sanghoon Lee, PhD for the interactive learning of histological patterns by biologists and physicians. I was involved in the validation of both iterations of the software. HistomicsML enables rapid training of machine learning models (eg. to identify vascular endothelial cells in glioma) in a few learning cycles, by focusing the user's attention to regions with high model uncertainty.

Software download and usage instructions can be found below:

> Segmentation-free system (general approach): Version 2.0.

> Using segmentation boundaries (Image analysis expertise required): Version 1.0.

A demo of the HistomicsML v2.0 software.

Ripley's K for image clustering analysis

This is a MATLAB tool for biologists to calculate Ripley's K function for grayscale images, and can be downloaded here. The detailed methodology and validation is described in:

Amgad M, Itoh A, Tsui MM. Extending Ripley’s K-function to quantify aggregation in 2-D grayscale images. PLoS One. 2015;10(12):e0144404.

Sample use: quantifying the aggregation of proteins in fluorescent microscopic images.


Region segmentations for TNBC (Amgad et al., 2019)

Use this repo to download all elements of the dataset described in:

Amgad M, Elfandy H, ..., Gutman DA, Cooper LAD. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics. 2019.

This data can be visualized in this public instance of the DSA. This contains the manually annotated boundaries of histologic regions from 150 triple-negative breast cancer (TNBC) slides from The Cancer Genome Atlas.

Adult Rhabdomyosarcoma (Elsebaie et al., 2018)

Use this link to download the dataset used in:

Elsebaie M, Amgad M, …, Elsayed Z. Management of low and intermediate risk adult rhabdomyosarcoma: A pooled survival analysis of 553 patients. Scientific Reports. 2018 Jun 19;8(1):9337.

This contains retrospective individual patient data from ~550 patients with adult Rhabdomyosarcoma, collected from published case series and reports. Original authors were contacted for complete records.