A Dataset and a Technique for Generalized Nuclear
Segmentation in Histological Images
Nuclear segmentation in digital microscopic tissue images can enable extraction of high quality features for nuclear morphometric and other analyses in computational pathology. However, conventional image processing techniques such as Otsu and watershed segmentation do not work effectively on challenging cases such as chromatin-sparse and crowded nuclei. In contrast, machine learning-based segmentation techniques are able to generalize over nuclear appearances. However, training machine learning algorithms require datasets of images in which a vast number of nuclei have been annotated. Publicly accessible and annotated datasets along with widely agreed upon metrics to compare techniques have catalyzed tremendous innovation andprogress on other image classification problems, particularly in object recognition. Inspired by their success, first, we introduce a large publicly accessible dataset of H&E stained tissue images with painstakingly annotated nuclear boundaries. The quality of the annotations was validated by a medical doctor. Because our dataset includes a diversity of nuclear appearances from several patients, disease states, and organs, techniques trained on it are likely to generalize well and work right out-of-the-box on other H&E stained images. Second, we also propose a new metric to evaluate nuclear segmentation results that penalizes object- and pixel-level errors in a unified manner, unlike previous metrics that penalize only one or the other type of error. Finally, we propose a segmentation technique based on deep learning that lays special emphasis on identifying nuclear boundaries, including those between crowded nuclei. Consequently, our technique works well on diverse test images.