Predicting Cancer Evolution: A Data-Driven Approach

"Nothing in biology makes sense except in the light of evolution"

Adnan Akbar
January 16, 2020
February 10, 2020

Cancer is an evolutionary process

Forty years ago, Peter Nowell first formally described cancer as an evolutionary process driven by natural selection of mutations¹. This hypothesis has since been substantiated by rapid expansion of research in cancer genomics. Recent advances in both single cells and multi-regional biopsies has revealed the space-time genetic diversification of cancer cells within the same tumour, more commonly known as Intra-Tumour Heterogeneity (ITH).

"Nothing in biology makes sense except in the light of evolution" — Theodosius Dobzhansky, 1973

ITH reflects the presence of different types of cancer cells within same tumour. These different cancer cells reside in the form of sub-clones, competing with each other for resources under conditions of Darwinian natural selection. At any stage of cancer, ITH can be viewed as the combination of different sub-clones evolving as shown in the figure below. The majority of cancer treatments end in failure due to ITH and its evolution during treatment.

Clonal Evolution Model, Nowell et al.¹

Predicting sub-clonal evolution is important

The ability to precisely predict sub-clonal evolution of cancer over time would be highly beneficial in reducing cancer treatment failures. Such capability would enable oncologists to create risk profiles for patients and develop optimised treatments by therapeutically targeting sub-clones which are more likely to grow. It can also help to predict cancer relapse, enabling oncologists to take proactive measures to address it. Furthermore, the ability to foresee such growth will help make therapeutic decisions faster, sparing the patient the horrible side effects associated with ineffective cancer therapies.

What makes it a challenging problem?

Predicting the growth of these sub-clones within a tumour is among the key challenges of modern cancer research. Under the Darwinian evolution theory, cancer evolution is governed by three basic processes³:

  • The generation of heritable variation i.e. random mutations
  • The influence of random birth and death events on the fate of new genotypes, referred to as genetic drift
  • Darwinian selection, which changes the frequency of genotypes in the population based on their relative fitness advantage

Cancer Evolution Processes, Kamil et al.³

The first two processes are stochastic in nature and impossible to capture with the current technology. However the third process, Darwinian selection is deterministic in nature to some extent. Different research efforts estimate Darwinian selection by quantifying the selective advantage of sub-clones in micro-tumour environment and using this selective advantage to predict sub-clonal growth. A nice review on the topic is presented here.

Current approaches

Current research efforts in this space are focused on quantifying the selective advantage of sub-clones using mathematical models from population genetics⁴. These models are based on the assumption that the intra-tumour environment remains unchanged , and thus focus entirely on quantifying the selective advantage in a static environment. These models do not take into account the location and biological characteristics of the underlying mutations, hence predictions are entirely dependent on the frequency distribution of mutations. Many of these assumptions are not valid for real-world tumour micro-environments and are far away from actual clinical applications.’s data-driven approach

In contrast to existing approaches, at Cambridge Cancer Genomics we are exploring a novel data-driven method towards predicting cancer evolution, based on machine learning. Our approach is based on the intuition that if we can represent the sub-clonal population with the right features, which truly encapsulate all the characteristics of a tumour, we can use this to train machine learning models to predict cancer evolution. Machine learning models based on deep learning have the potential to capture the randomness introduced into tumour evolution by genetic drift and random mutations, if trained with enough data using the right set of features.

Our work is the first step towards a totally data-driven approach and initial results have shown great potential. With the help of oncologists, biologists and experts in translational genomics, our team is quickly building a unique feature set for precise representation of cancer tumours. Our partners are helping us to gather an extensive longitudinal database which will help us to unlock the true potential of deep learning in this domain and make cancer evolution prediction a reality.

  • Written by Adnan Akbar, Data Scientist at
  • Edited by Belle Taylor, Strategic Communications and Partnerships Manager at
  • Thanks to Harry Clifford, Geoffroy Dubourg-Felonneau and Philip Beer for valuable discussions

References consulted:

This is some text inside of a div block.