Machine Learning in Genomics Research And Healthcare

Machine Learning can help provide valuable insights into the role genes have to play in the onset and treatment of disease

Belle Taylor
January 24, 2020
February 10, 2020

Genomics, the field of science associated with the study of the structure and function of the genome, is an incredibly exciting part of the research space. Through the study of genetic data, we can, for example, get valuable insights into the role genes have to play in the onset and treatment of disease.

In order to extract the genetic information required for genome studies, DNA must be sampled and sequenced (analysed to work out its unique letter “string”of As, Ts, Gs and Cs): this in itself produces a huge amount of data. As technologies become ever more sophisticated, they create ever deeper, richer (and bigger) datasets, the same is true for genomic data.

The problem with these datasets, particularly if you want to combine them, is that they are too large for traditional theoretical and applied statistical techniques. Additionally, most of the important signals in genomics datasets are often incredibly small and masked by technical noise, and thus require far more sophisticated analysis techniques. It is for these reasons that machine learning (ML) is being used so successfully to draw clinically useful information from the datasets generated from genomic sequencing.

Due to advances in ML and genomics research, coupled with better access to processing power, there has been an explosion in startups being founded to work in the cross-disciplinary gaps of genomics, medicine and machine learning. Some particularly interesting examples are highlighted below:

  • Cambridge Cancer Genomics ( utilises a precision artificial intelligence (AI) platform to empower oncologists to provide personalised cancer treatment. On-going genomic profiling and prediction is used to tailor FDA approved treatment strategies based on the molecular drivers of each cancer and each patient’s unique genomic profile. Combining this analysis with non-invasive liquid biopsies allows monitoring of treatment response, over time.
  • uses computational medicine and AI to help improve the way medicines are designed, developed, tested and brought to market. Their platform helps identify molecular targets and design the drugs to reach them.
  • Deep Genomics: uses machine learning to help analyse and interpret genetic variation. Specifically how patterns of SNPs (Single Nucleotide Polymorphisms) can help in the understanding of crucial cellular processes, such as metabolism and DNA repair, across populations.
  • Freenome: uses machine learning in the identification of multi-omic cancer risk signatures from blood samples. It is hoped that this will develop into a minimally invasive screening test for multiple cancers.

In addition to its potential in healthcare analyses, there is incredible potential for machine learning to help streamline clinical systems. Current healthcare systems are dated and, due to the fact that patient data is collected and formatted in different ways, clinical data is non-uniform and often difficult to interpret and compare. Integration of machine learning into the clinical workflow would help to remove gaps in the data available to healthcare professionals and allow integration of other datasets (such as genetic information) — a vital step in enhancing the value of medical data and in better understanding patient treatment and care.

Another exciting and valuable area of machine learning applications is in the direct-to-consumer genomics space. The success of 23andMe shows the huge potential of consumer-led genomics analysis for scientific research. For example, they recently uncovered a link between genetics and BMI by using ML to collate 600,000 customers’ personalised genetic evaluations. With over 2m customers, and a recent strategic partnership with GlaxoSmithKline, 23andMe’s next step is bound to be interesting.

Machine learning, and enhanced statistical analysis, is absolutely essential to exploit the value of large, healthcare datasets. Its ability to quickly carry out complex analyses on rich genomics databases; in modernising and standardising clinical systems; and in empowering individuals to know their own genome; show that the future of genomics and healthcare will rely heavily on working with technology.

  • Written and Edited by Belle Taylor, Strategic Communications and Partnerships Manager at

This is some text inside of a div block.