£4M to develop next-generation data science approach

Published on

data_science2W

Scientists at the University of Liverpool are leading a £4M data science research project that aims to harness the power of emerging hardware (such as graphics cards) to significantly reduce the time it takes to unlock the latent value present in difficult data sets.

With ever more complex data sets being generated by science, society, government and industry, new approaches are needed so that this data can be used more efficiently and effectively to make decisions and take appropriate action.

Researchers at the University’s Liverpool Big Data Network, supported by staff from the Science and Technology Facilities Council’s Hartree Centre, have been awarded funding by the Engineering and Physical Sciences Research Council (EPSRC) to explore a new method of doing this which uses a type of algorithm (Sequential Monte-Carlo (SMC) sampler) that already exists but is currently largely overlooked by data scientists.

SMC samplers will replace a generic data modelling method - MCMC (Markov Chain Monte Carlo) - which is commonly used to power data science across academia, industry and government.

This change will make it possible to exploit the power of emerging hardware – such as graphics cards and modern many-core vector processors - to solve problems which are currently largely incompatible with such hardware.

It is hoped that this will dramatically increase the generic ability to solve tough inverse problems –problems involving identifying the parameter of a model that best explains the observed data. Such problems include, for example, biomarker-discovery, data assimilation and behavioural analytics.

The project will focus on the following industrial and academics areas which all generate and manipulate difficult data sets: pharma, nuclear security, defence, manufacturing, biology, chemistry, physics and psychology.

Professor of Autonomous Systems, Simon Maskell, said: “This is an innovative research project that aims to develop a new approach to data science by enabling Bayesian inference to achieve what neural networks have recently achieved using Deep Learning. Ultimately, we want this advance to really speed up the time it takes to get value from data so that something which currently takes days will take only seconds. ”

Deputy Director of STFC Hartree Centre, Michael Gleaves, said: “This project will strengthen the ability of the team, as a key pillar within the UK community, to take the lead in developing a next-generation solution to problems encountered across a vast range of industrial sectors.”

IBM Research, with over 24 Researchers in the STFC Hartree Centre in Daresbury, is a key industrial partner in the project and, as well as working as an integral part of the team, will also provide access to supercomputing facilities supported by computational science and engineering expertise.  IBM Research, together with the UK government, recently invested £300M in establishing a Research and Development Collaboration targeting multiple industries in the UK helping them to exploit new algorithmic approaches on advanced computing architectures in their businesses.

IBM Research's Chief Science Officer for the collaboration with STFC, Kirk Jordan, said: “IBM Research is excited about this particular project as it is yet another example coming out of the Hartree Centre demonstrating how the STFC and IBM collaboration will provide significant benefits and value to UK industrial competitiveness and will create reusable assets to help multiple industries.  In addition, working with Liverpool University, we are pleased to help develop the computational skills of the next generation of scientists.”

The project, one of five funded through the EPSRC `New Approaches to Data Science’ programme, also involves AstraZeneca, Unilever, DSTL, Intel, NVidia, Maths Knowledge Transfer Network and Columbia University.

To find out more about the University's digital research theme, click here.