Validation of Image Segmentation by Estimating Rater Bias and Variance

Simon K Warfield, Kelly H Zou, and William M Wells. 2008. Validation of Image Segmentation by Estimating Rater Bias and Variance. Philos Trans A Math Phys Eng Sci, 366, 1874, Pp. 2361-75.
Copy at


The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a ground truth or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data. An alternative assessment approach is to compare with segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically, these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear. We present here a new algorithm to enable the estimation of performance characteristics, and a true labelling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, among others, surface, distance transform or level-set representations of segmentations, and can be used to assess whether or not a rater consistently overestimates or underestimates the position of a boundary.
Last updated on 02/24/2023