NeurIPS 2019

December 13, 2019 present five papers at NeurIPS 2019 present machine learning advances in identifying cancer mutations and predicting tumour growth

Cambridge Cancer Genomics (, together with researchers from the University of Edinburgh and the University of Cambridge, today present 5 papers at NeurIPS 2019, outlining their research into predicting tumor evolution and identifying cancer causing mutations.

NeurIPS, one of the world’s largest AI summits, is an industry highlight. This year’s conference has attracted 14,000 Machine Learning (ML) and Artificial Intelligence (AI) professionals, drawn to NeurIPS to hear about research at the cutting-edge of machine learning and data science. This year 1429 papers have been selected for publication out of 4854 submitted papers., a Cambridge (UK) based startup using AI to enable oncologists to provide more effective personalised treatment for cancer patients, are presenting 5 papers across 4 workshops. The papers outline their machine learning led advances in identifying cancer causing genetic mutations and in predicting tumor growth.

Precision Oncology and Genetic Variants

The emerging field of precision oncology relies on the accurate pinpointing of genetic changes in the molecular make-up of a tumor, in order to provide personalized targeted treatments. In order to “read” a strand of DNA, next generation sequencing technologies are applied to a tumor sample. This is followed by analytical techniques to identify non-inherited mutations in the DNA: these genetic mutations are known as somatic variants.’s NeurIPS papers outline approaches to improve the accuracy and efficiency of “variant calling” in tumor derived genetic data:

  • Paper 1 ( outlines a machine learning technique that accurately calls genetic mutations in cancer and is well suited to real-world data sets
  • Paper 2 ( presents an approach that results in  safe, robust, and statistically confident somatic mutation calls for precision oncology treatment choices
  • Paper 3 ( presents a technique that allows for compression of somatic variants without losing the predictive power of the uncompressed original
  • Paper 4 ( represents somatic mutation data in lower dimensions to allow for uptake and analysis in precision oncology research

Predicting Tumor Growth

Cancer is not one disease. Instead, a tumor is made up of many different sub-types (sub-clones) of cancer, each due to different genetic mutations. In fact, the majority of cancer treatments end in failure due to this variation of sub-clones across the tumor, termed Intra-Tumor Heterogeneity (ITH).

Predicting the growth of the sub-clones within a tumor is among the key challenges of modern cancer research. Successful modelling of tumor behavior allows oncologists to select the best treatments for their patients by  targeting “high-risk” (more likely to grow) sub-clones in their tumor first.

A Clear Need

Harry Clifford, Chief Technology Officer and Study Lead says:

“Here at, we’re continuously working to bring the latest in machine learning and AI research into cancer care. The need for this has never been clearer: the average cost of cancer treatment is now at >$150,000, meaning more than 20% of U.S. cancer patients are now declaring bankruptcy, and yet up to 64% of patients still do not respond to the first line therapy. 
Analyzing and understanding the exact molecular dynamics underlying an individual’s tumor is key in getting each patient the right treatment, at the right time, to beat their cancer. Publication of our work at NeurIPS is a testament to the hard work our developers and engineers have been putting into the groundbreaking analytical methods that will help achieve this, and we are very much looking forward to presenting our latest tools in the fight against cancer.”

Press Enquiries:

For further information, or to arrange an interview, please contact:

Belle Taylor, Cambridge Cancer Genomics - (UK, PST+8hrs)

Harry Clifford (Co-founder and Chief Technology Officer) will be at NeurIPS, Vancouver (PST) for in-person interviews - 


Paper Summaries:

  1. Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Summary: Non-inheritable genetic mutations, or somatic variants, occur at incredibly low numbers in DNA sequences. Differentiating these variants from errors picked up in the sequencing process poses a classification problem which supervised machine learning methods, such as neural networks, look to solve. Through application of deep bayesian neural networks on next generation sequencing data, we have found that they demonstrate similar performance in “calling” somatic mutations in cancer as when using standard neural networks. Additionally, they are better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world.

  1. Safety and Robustness in Decision Making: Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Summary: Identification of somatic mutations in cancer DNA is a difficult task: differentiating somatic (non-inherited) variants from germline (inherited) variants, as well as separating true results from background noise, are both tricky classification problems. Here we present a  technique that relies on deep bayesian recurrent neural networks for cancer variant calling. This approach is shown to be high performance, flexible, and avoids the problem of over-fitting to a single dataset. Our results show we can obtain safe, robust, and statistically confident somatic mutation calls for precision oncology treatment choices.

  1. Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Summary: Identifying somatic variants in cancer requires analyzing incredibly large datasets, a process which is computationally demanding. In this study, we propose a technique to compress these datasets by reducing the number of datapoints needed to describe a patient’s unique features, without losing information. We show that this technique works for multiple neural network architectures and confirm that the compressed dataset has the same predictive power as the original: it is able to predict drug response to the same accuracy as with the original number of datapoints. 

  1. Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Summary:  The low frequency and varying rates of somatic mutations across patients makes the data extremely challenging to statistically analyze, as well as difficult to learn useful information from. In this paper we present a technique based upon low dimensional representations of somatic mutations, which will allow datasets containing information on the DNA of cancer cells to be more easily analyzed. The success of this technique on a variety of classification tasks shows great potential for use in data-driven precision oncology research.

  1. Effective Sub-clonal Cancer Representation to Predict Tumor Evolution

Summary: Predicting the growth of sub-clones within a tumor is among the key challenges of modern cancer research.  Currently, research focuses on mathematical modeling techniques which quantify the selective advantage of sub-clones, and thus predict which are more likely to grow. We present a novel approach for predicting cancer evolution using a data-driven machine learning method. By capturing the real-world characteristics of sub-clones in a tumor, and representing them in the form of features in our machine learning models, a sophisticated algorithm can be trained to predict tumor growth and behavior.


About Cambridge Cancer Genomics:

Cambridge Cancer Genomics ( is a Y combinator backed startup building software to enable data-driven precision oncology. Our precision Artificial Intelligence (AI) platform enables oncologists to provide more effective, personalised treatment for cancer patients. We’re on a mission to ensure that each patient has the right treatment, at the right time, to beat their cancer.’s intelligently designed algorithms analyze and interpret DNA from a cancer patient, providing genomic insights into individual tumors as they change and grow. Our technology gives actionable insights into treatment effectiveness and provides personalised treatment and trial recommendations; it also powers intelligent clinical trial design and drug development.