Orange Spiral representing PRIOR project


Predicting Incidence Of Relapse


PRIOR is available for use on GitHub.


Outcome prediction at >80% accuracy. Results published at NeurIPS 2018 and here.


Build a preprocessing pipeline to enable machine learning researchers to work on genomics datasets.


Use our pipelines to understand risk of relapse in breast cancers.

The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning techniques. These data are often  generated  across different technologies in different labs, and  frequently with high dimensionality. We present a framework for combining -omics data sets, and for handling high dimensional data, making -omics research more accessible to machine learning applications. We demonstrate the success of this framework through integration and analysis of multi-analyte data for a set of 3,533 breast cancers. We used this processed dataset to predict breast cancer patient survival for individuals at risk of an impending event, with higher accuracy and lower variance than methods trained on individual datasets. We hope that our pipelines for dataset generation and transformation will open up -omics data to machine learning researchers.

Our deep learning pipeline enables the use of high dimensional -omics data from disparate sources to predict clinical outcomes. We demonstrate this through prediction of short term survival in breast cancer patients, with the hope of greater monitoring and care for those patients at high risk. Additionally, we believe the pre-processing pipelines made available to the community will be especially beneficial in opening up -omics data to machine learning researchers.

View the peer reviewed paper →