Learning from noisy data: how to teach machines when doctors disagree with each other


Access to clean and voluminous datasets is a piece of luxury confined to academic research for many machine learning applications. In practice, such datasets are hard to come by, and consequently limit the performance of deployed machine learning systems. This problem is pervasive in medical imaging applications where the cost of data acquisition and labelling is high. In this talk, I will present a method that is capable of learning more intelligently from such noisy data by modelling the human annotation process. This is particularly relevant in situations where data is labelled by multiple annotators of varying skill levels and biases.

New York, USA

Invited by Prof. Krzysztof J. Geras and Prof. Linda Moy