EMC is providing the data lake required to support large scale data collection and analytics generated by the project’s genome sequencesEMC Corporation, today announced that it is Genomics England’s official IT storage supplier, supporting the organisation in the completion of the 100,000 Genomes Project. Using VCE vScale, with EMC Isilon and EMC XtremIO, Genomics England will store the genome data securely for analysis.
A ground-breaking project announced by the former British Prime Minister David Cameron in 2012, the 100,000 Genomes Project is sequencing 100,000 whole genomes from 70,000 NHS cancer and rare disease patients and their families. Genome sequencing has the potential to shift the way in which we approach healthcare. The project aims to deliver a new genomic medicine service with the NHS, to support better diagnosis and more personalised treatments for patients.
Once a genome has been sequenced, the information, amounting to hundreds of gigabytes per genome sequence, is stored digitally. Data in the Project will increase 10 fold over the next two years. It will be key to provide the agility needed to analyse and compare these immense data sets. De-identified data from the 100,000 Genomes Project will also be made available to approved researchers from academia and industry to help accelerate the development of new treatments and diagnostic tests that are targeted at the genetic characteristics of individual patients.
In the past, Genomics England was using EMC Isilon for storage of its sequence library alone. The organisation has now chosen to use an Isilon data lake for all the data collected during genome sequencing. Once captured at the sequencing centre in Cambridge, UK, the file is stored on Genomics England’s secure IT infrastructure. The Isilon data lake will facilitate initially 17PB of data to be stored and made available for multi-protocol analytics, including Hadoop. Alongside the Isilon data lake, 24 X-Bricks of all-flash XtremIO is in place to support their virtualised applications. EMC’s Data Domain and Networker are also used to provide back-up services. The net result is resilient infrastructure that supports massive scalable data storage with robust analytics.
The Genomics England computing environment is delivered by both on-premise servers and Infrastructure-as-a Service, provided by Cloud Service Providers on G-Cloud. One of the key legacies Genomics England will create is an ecosystem of CSP’s providing low cost, elastic compute on demand through G-Cloud, bringing the benefits of scale to smaller research groups.
“There are few better examples of the fundamental impact that analysis of data sets can have on society. It’s a privilege to be chosen as the IT storage provider for Genomics England and to be part of a revolutionary time for genome analysis. Genomics has the potential to transform healthcare and redefine the way the NHS operates, uncovering medical treatments, benefitting patient experiences and transforming the economics of universal healthcare in the UK”, comments Ross Fraser, Vice President and Managing Director, UK&I, EMC. “Delivering the platform for this large scale analytics in a hybrid cloud model will help accelerate the impact Big Data analytics could have on the NHS, potentially delivering billions in efficiencies in care delivery and improving patient outcomes immeasurably.”
Dave Brown, Head of Informatics Infrastructure at Genomics England said “This project is at the cutting edge of science and technology. EMC’s data lake platform provides the secure data storage that we need, with the flexibility and power to undertake complex analysis.”