Machine Learning can help provide valuable insights into the role genes have to play in the onset and treatment of disease
Genomics, the field of science associated with the study of the structure and function of the genome, is an incredibly exciting part of the research space. Through the study of genetic data, we can, for example, get valuable insights into the role genes have to play in the onset and treatment of disease.
In order to extract the genetic information required for genome studies, DNA must be sampled and sequenced (analysed to work out its unique letter “string”of As, Ts, Gs and Cs): this in itself produces a huge amount of data. As technologies become ever more sophisticated, they create ever deeper, richer (and bigger) datasets, the same is true for genomic data.
The problem with these datasets, particularly if you want to combine them, is that they are too large for traditional theoretical and applied statistical techniques. Additionally, most of the important signals in genomics datasets are often incredibly small and masked by technical noise, and thus require far more sophisticated analysis techniques. It is for these reasons that machine learning (ML) is being used so successfully to draw clinically useful information from the datasets generated from genomic sequencing.
Due to advances in ML and genomics research, coupled with better access to processing power, there has been an explosion in startups being founded to work in the cross-disciplinary gaps of genomics, medicine and machine learning. Some particularly interesting examples are highlighted below:
In addition to its potential in healthcare analyses, there is incredible potential for machine learning to help streamline clinical systems. Current healthcare systems are dated and, due to the fact that patient data is collected and formatted in different ways, clinical data is non-uniform and often difficult to interpret and compare. Integration of machine learning into the clinical workflow would help to remove gaps in the data available to healthcare professionals and allow integration of other datasets (such as genetic information) — a vital step in enhancing the value of medical data and in better understanding patient treatment and care.
Another exciting and valuable area of machine learning applications is in the direct-to-consumer genomics space. The success of 23andMe shows the huge potential of consumer-led genomics analysis for scientific research. For example, they recently uncovered a link between genetics and BMI by using ML to collate 600,000 customers’ personalised genetic evaluations. With over 2m customers, and a recent strategic partnership with GlaxoSmithKline, 23andMe’s next step is bound to be interesting.
Machine learning, and enhanced statistical analysis, is absolutely essential to exploit the value of large, healthcare datasets. Its ability to quickly carry out complex analyses on rich genomics databases; in modernising and standardising clinical systems; and in empowering individuals to know their own genome; show that the future of genomics and healthcare will rely heavily on working with technology.