The challenge of auto-contouring assessment

Assessing the quality of auto-contouring in a systematic, unbiased, way is difficult.

The most common method of assessment is quantitative comparison to manual contouring (e.g. [1]). This approach has the benefit that it requires limited additional contouring work on the part of investigators. However, it has a number of drawbacks; It assumes the manual contour is correct, fairly to account for the potential of human error. The quantitative scores used are often blunt in that they do not adequately account for the difference between plausible contour variation and definitely wrong contouring [2].

An alternative approach is to consider the impact on the clinical task that the auto-contouring seeks to improve. To this end, the clinical time saving with auto-contouring is compared to the time without assistance (e.g. [3]). Such studies have the draw back that they require additional clinical time to perform the assessment. Furthermore, bias may be introduced by the investigator knowing that they are working with a contour generated automatically, which may influence their decision to edit a contour.

The imitation game

In 1950, Alan Turing sought to answer the question "Can machines think?" [4]. This turned out to be perhaps a little to challenging, and he reformulated to question to whether machines can display intelligence through imitation. The Imitation Game which Turing describes is relatively complex, involving an interrogator judging the sex of subjects with who they are communicating blindly.

The Leobner Prize, started in 1991, simplified this test to being a one to one conversation between the interrogator and the human, where the interrogator is asked to judge whether the subject is human or not.

This concept of a situation where an interrogator "talks" to a subject and tries to identify if that subject is a human or a computer does not necessarily assess intelligence, but rather assesses if the computer can be judged to have performed a particular task to the same level as a human [5,6].

Contouring assessment by imitation

Given that there is variability in manual contouring, the best an auto-contouring solution might be expected to do is to imitate this human behaviour. This website has been set up to perform simplified variations of Alan Turing's Imitation Game. We seek to answer the question "Can machines contour?"


[1] Delpon G et al. Comparison of automated atlas-Based segmentation software for Postoperative Prostate cancer radiotherapy. Frontiers in Oncology. 2016;6

[2] Sharp G et al. Vision 20/20: Perspectives on automated image segmentation for radiotherapy. Medical physics. 2014;41(5).

[3] Yang J, et al. Automatic contouring of brachial plexus using a multi-atlas approach for lung cancer radiation therapy. Practical radiation oncology. 2013;3(4):e139-47 .

[4] Turing AM. Computing machinery and intelligence. MIND. 1950; 59(236): p. 433-460

[5] Gunderson K. The imitation game. Mind. ; 73(290): p. 234-245.

[6] Saygin AP, Cicekli I, Akman V. Turing Test: 50 Years Later. Minds and Machines. 2000; 40: p. 463-518.

Research team

This website is run as a scientific research project by the Science and Medical Technology group of Mirada Medical Ltd, as part of the Eurostars "Cloud Atlas" project. (Ref 9297). The data obtained may be used for scientific publications. No participants will be explicitly named unless otherwise agreed.