Identification of Driver Mutations for Precision Oncology

What are driver mutations and why they are important?

Adnan Akbar
January 2, 2020
February 17, 2020

What are driver mutations and why they are important?

Cancer is a disease of the genome which is caused by aberrations in the coding region of the genome. These aberrations are more commonly called genomic mutations. Currently, cancers are commonly treated using a generic approach based on primary tumour location rather than the underlying genomic mutation profile. As technology is progressing, our understanding about cancer is changing as well: no two cancer tumours in the world are the same, and hence cannot be treated similarly. Prescribing similar treatments just based on the cancer type is an inefficient approach leading towards treatment failure and developing resistance to treatments.

A more personalised approach based on the individual genetic profile is required, targeting the specific genomic mutations that confer a selective advantage in an individual tumour. These key mutations which are responsible for driving cancer are called “driver mutations”. Identification of these driver mutations is the first and most important step towards personalised cancer treatments.


Current landscape for identification of driver mutations

The current landscape towards the identification of driver mutations is multi-directional, with enough methods to confuse anyone. To make it worse, there is no standard evaluation criteria as there are literally hundreds of thousands of unique mutations, and to validate each mutation clinically for its status is an almost impossible task. Furthermore, many methods are based on similar concepts but with different approaches. Broadly speaking, these approaches can be categorised into the following three categories:

  • Statistical models based on the recurrence (frequency) of mutations: These methods are based on the recurrence score of the underlying mutations for a given cohort of patient data. As driver mutations confer selective advantage and thus have an increased likelihood of occurrence, frequency-based statistical models are known to be the most effective approach. These methods are not suited to rare, low frequency driver mutations.
  • Functional-impact based methods: The alternative approach to address this is through functional-impact based methods. These methods compute pathogenic scores based on the location of mutation to estimate the effect of mutation. This approach is highly prone to false positives as many mutations might have high pathogenic score but are not related to driving the cancer tumour.
  • Features-based methods: Methods from this category are based on extracting different features, mainly biological, ratio-metric or structural features based on the location of mutations. These features are then used to train machine learning algorithms to predict the driver status of mutations. These methods are aimed at predicting rare driver mutations which are less frequent and without clinical evidence.

Our take on the current landscape

At Cambridge Cancer Genomics (CCG), we have spent a fair amount of time implementing many of the algorithms from the above mentioned categories and evaluated their performance on different patient’s data with the help of our in-house team of molecular oncologists. In brief, we arrive at the following conclusions:

  • Frequency-based methods are relatively simple, and are the most efficient way to detect driver status of mutations; they are also widely accepted within the clinical communities. Different approaches based on frequency methods differ with each other in estimating expected mutability (or background mutation rate) using different methods. Due to the nature of problem, most of these methods are based on intuition rather than any theoretical knowledge and we feel that they can be further improved. In fact, we have been working in this direction for sometime now with encouraging results. One drawback with this approach is that it requires a large cohort of patient data and will not be able to detect rare and less frequent driver mutations.
  • Functional-impact methods vary a lot in performance, they don’t even match with each other when predicting driver mutations. Some research studies propose using ensemble based approaches to combine different functional-impact methods, but still the performance of these methods are prone to producing many false positives. The impact of mutations is not always indicative of driving the cancer. Many mutations can have a higher pathogenic score without contributing towards carcinogenesis or tumorigenesis processes.
  • Features-based approaches have shown good potential towards the identification of rare and less frequent driver mutations. In this regard, identification of the right features to truly encapsulate the characteristics of the mutation is an important yet challenging task. These methods are based on machine learning models where features are used to train the models. As the famous saying goes for machine learning models “garbage-in, garbage-out”: the role of the features is of significant importance. We believe that these methods will come into the spotlight in near-future.

How CCG.ai is addressing the problem?

At CCG.ai, our aim is to accurately detect the driver mutations, with the least number of false positives, so clinicians can trust them. At the same time, we don’t want to miss the rare driver mutations which might play an important role in the treatment of cancer patients. In order to achieve this, we propose a novel approach which combines the power of all three of the above mentioned methods, with the help of machine learning. In our approach, we have developed a statistical model to get a score based on the frequency of mutations, and use this score as a feature. This is combined with functional-impact scores and other features based on the location of mutations, such as the gene betweenness, gene degree and structural features. These features are further combined with advanced machine learning algorithms to detect driver mutations. By combining statistical methods with a features-based approach, we are able to accurately detect frequently occurring driver mutations and, at the same time, are able to detect rare driver mutations. Currently, we are in the process of evaluating our approach comprehensively with existing methods to demonstrate the gain and accuracy towards accurately detecting driver mutations.


  • Written by Adnan Akbar, Data Scientist at CCG.ai
  • Edited by Belle Taylor, Strategic Communications and Partnerships Manager at CCG.ai
  • Thanks to Philip Beer, Harry Clifford and Geoffroy Dubourg-Felonneau for valuable discussions

References consulted:

This is some text inside of a div block.