A large-scale and extensible platform for precision medicine research

A large-scale and extensible platform for precision medicine research

Belghait, Fodil and April, Alain and Hamet, Pavel and Tremblay, Johanne and Desrosiers, Christian

ACM International Conference Proceeding Series 2019

Abstract : The massive adoption of high-throughput genomics, deep sequencing technologies and big data technologies have made possible the era of precision medicine. However, the volume of data and its complexity remain important challenges for precision medicine research, hindering development in this field. The literature on precision medicine research describes a few platforms to support specific types of studies, but none of these offer researchers the level of customization required to meet their specific needs [1]. Methods: We propose to design and develop a platform able to import and integrate a very large volume of genetics, clinical, demographical and environmental data in a cloud computing infrastructure. In our previous publication, we presented an approach that can customize existing data models to fit any precision medicine research data requirement [1] and the requirement for future large-scale precision medicine platforms, in terms of data extensibility and the scalability of processing on demand. We also proposed a framework to meet the specific requirement of any precision medicine research [2]. In this paper, we describe how this new framework was implemented and trialed by the precision medicine researchers at the Centre Hospitalier Universitaire de l’Université de Montréal (CHUM). Results: The data analysis simulations showed that the random forest algorithm presents better accuracy results. We obtained an F1-Score of 72% for random forest, 69% using linear regression and 62% using the neural network algorithm. Conclusion: The results suggest that the proposed precision medicine data analysis platform allows researchers to configure, prepare the analysis environment and customize the platform data model to their specific research in very optimal delays, at very low cost and with minimal technical skills.