Recent research project exemplifying Distributed Algorithms

Big Hypotheses: A Fully Parallelised Bayesian Inference Solution

Bayesian inference is a process which allows us to extract information from data. The process uses prior knowledge articulated as statistical models for the data. This project is focused on developing a transformational solution to Data Science problems that can be posed as such Bayesian inference tasks.

An existing family of algorithms, called Markov chain Monte Carlo (MCMC) algorithms, provides a family of solutions that offer impressive accuracy but demand significant computational load. This computational load limits the significant potential offered by MCMC and points to a need for a practical alternative. Users of this alternative include academics working in science, and those working in government and industry.

The challenge

A successful alternative to match the accuracy offered by MCMC, accessible at a fraction of the computational cost.

The solution

Replace MCMC with a more recently developed family of algorithms, Sequential Monte Carlo (SMC) samplers.

Results

The crucial result is in the potential that SMC samplers offer in their "New Approach for Data Science". SMC samplers are an inherently population-based family of algorithms that manipulate a population of samples, and are well suited to the task of being implemented in a way that exploits parallel computational resources.

The work included in this project has shown that SMC samplers can offer accuracy similar to MCMC but with implementations that are better suited to such emerging hardware. It can be concluded that it is possible to use emerging hardware to make SMC samplers run much faster than MCMC.

The benefits of using an SMC sampler (rather than MCMC) go beyond what is covered above and there are many avenues to be explored further.

What’s next?

This project identifies value to be gained from a larger programme of work helping to understand the extent to which users will benefit from replacing MCMC with SMC samplers.

To achieve the desired impact, the core of our plan is to:

  • Use identified users to act as "evangelists" in each of their domains
  • Engage with the developer team for Stan (the most widely-used generic MCMC implementation) to provide a back-end for Stan that uses SMC samplers.

Back to: Centre for Doctoral Training in Distributed Algorithms