Neural Networks for Mutation Calling in Cancer

NeurIPS 2019 fact-sheets: overviews of the research we presented, and its future impact

Geoffroy Dubourg-Felonneau
January 16, 2020
February 10, 2020


Due to the way modern sequencing technologies work, somatic variant calling (identifying non-inherited genetic mutations) is an important challenge for any precision oncology platform. In order to sequence DNA at a feasible throughput and price point, it is necessary to use next-generation sequencing (NGS) technology. NGS produces overlapping short reads of a small section of DNA in a massively parallel fashion, with much greater throughput than traditional methods, but at the cost of reduced accuracy.

By comparing the sequencing results for tumour and normal samples at a given location, it should be possible to check how the genome of the tumour is different from the healthy genome present in the rest of the cells of the patient’s body. This is essentially what the problem of somatic variant calling is: finding the locations in a tumour genome where the base (T, C, G or A) differs from that location in the corresponding normal genome.

However, the data from the sequencing machines are noisy, and the steps required to reach an aligned genome from raw sequencing data introduce further uncertainty. The volume of data produced is enormous (there are 3 billion bases in a whole human genome) and somatic variant calling is therefore rather challenging.

Creating a well-validated generalized somatic variant caller is an open problem. There are a number of approaches, which often produce conflicting results, and it is important for oncologists to have a confidence interval so that they know how certain the caller is about its decision.

Enter machine learning. A large body of research exists on how to use big datasets to train machine learning models to make predictions, even in the face of noise and uncertainty. However, lots of the recent advances in model accuracy have come at the cost of poorer model calibration. In practice, this means that although the model might be able to classify more samples correctly, when it is wrong it is very confident that it is right. In this paper, we present a machine learning model for somatic variant calling that is both accurate and well-calibrated.

"Creating a well-validated generalized somatic variant caller is an open problem…it is important for oncologists to have a confidence interval so that they know how certain the caller is about its decision."

How Does It Work?

When a model is trained on a given dataset, we can only guarantee its performance on new data if that data has been drawn from the same distribution. For example, if an image classification model were trained to distinguish between dogs and cats, it would not be able to do the same for pictures of microwaves and ovens.

In many cases, particularly in healthcare, the data we will use the model on in a live system is not guaranteed to be identically distributed to the training data. Moreover, there is no easy way to assess if this is the case or not beforehand. This is often referred to as out-of-distribution data (OOD). Whilst it might be acceptable for a model to perform worse on OOD, we would at least like the model to be uncertain in its predictions when asked to work with data of this type.

However deep learning does not capture uncertainty [2]. Most classification models only return a class probability vector that does not have information about the model uncertainty.

Bayesian Deep Learning:

Bayesian deep learning is a field that combines Bayesian theory with deep learning techniques. There are numerous advantages of doing so, but the one we are most interested in here is that the resulting models retain a measure of the uncertainty in their predictions.

We used a Bayesian deep learning technique known as Variational Inference (VI) and trained a model on data from the Multi-Center Mutation Calling in Multiple Cancers (MC3) [3] dataset.

Using VI we were able to present a model that classifies somatic variants while also returning the uncertainty when it is used on new genomic samples. In this way, the uncertainty can either be used by an oncologist or propagated to a decision-making system to be considered when producing a treatment strategy.

Out-of-distribution testing:

In our paper, we showed the use of Variational Inference for Variant Calling and compared it to the classical approach.

After training on the pileup images, which are generated from the genomic data, we artificially distort the image distribution by either adding a Gaussian noise or a black mask on the testing data. We want to show that the uncertainty increases when the images have been distorted.

Histogram of the output probabilities of VI (left) and classical (right) models when applied on in-distribution (top), OOD with noise (middle) and OOD with mask (bottom).

In the above figure, we show the histogram of output probabilities. When they are close to 0 or 1, we can say the model is confident. When they are close to 0.5, we can say the model is not confident. It is clear that when the data has been distorted (in the two last rows) the VI approach is much less confident than the classical approach. Given that in both cases the accuracy is the same, Bayesian Deep Learning gives us the amount of trust one should have in the variant calls the model makes.

We hope that this technique will help clinicians make more informed and better treatment decisions, and eventually show clinical impact in the advancement of precision oncology.

What’s the Impact?

As far as we are aware, no work has been done on applying OOD machine learning techniques to the somatic variant calling problem. This work shows that by taking a principled approach to quantifying the uncertainty in model predictions, we can produce a model that is of far greater utility in downstream applications.

It will enable clinicians, and possibly automated systems, to reach better conclusions about treatment strategies. It is also a step toward a more well-rounded approach to machine learning in healthcare generally. Very often, models are published with results on a given dataset, with no attempt to gauge how they might perform when faced with real-world data. This paper begins to address this concern, which should be of the utmost importance when the consequences of misclassification can be so serious.

Find out more

This blog gives a high level overview of a paper presented at the NeurIPS 2019 workshop: Bayesian Deep Learning

To learn more about this research, read the full paper. We discuss patient perceptions of AI based systems in this blog post.

We published 5 papers in total at NeurIPS 2019. Check out our press release to learn about our other Machine Learning advances.

  • Written by Geoffroy Dubourg-Felonneau, Lead Machine Learning Engineer
  • Edited by Belle Taylor, Strategic Communications and Partnerships Manager at and Christopher Parsons, ML Engineer at
  • Thanks to Christopher Parsons for working hard on making the article more accessible.

References consulted:

This is some text inside of a div block.