Alex Hill’s industry placement at IBM Research in Daresbury

Published on

A central part of the LIV.DAT Centre for Doctoral Training PhD scheme is the industry placement during which a LIV.DAT student is placed in a data-intensive company for a 6-month secondment. LJMU-based LIV.DAT student Alex Hill has recently completed his placement at IBM Research, based at the Hartree Centre in Sci-tech Daresbury, working on surrogate modelling. The early stages of the placement consisted of introductory classes in git, shell scripting and high performance computing, online lectures, studying relevant papers and researching the fundamentals of surrogate methods: predominantly Polynomial Chaos Expansion and Gaussian Process Emulation.

In brief, Polynomial Chaos Expansion is a method in approximating a simulator as a linear combination of basis functions, while Gaussian Process Emulation models a simulation as a Gaussian process with mean and covariance matrices which determine the strength of correlation in the model outputs and are updated with training runs. These methods originate from different disciplines, yet address similar challenges. The particular motivation of Alex’s research was rare event modelling - where the simulation modelled takes input values sampled from a Pareto distribution. These distributions are extremely heavy-tailed, where the bulk samples will have low values, though a non-zero sampling probability continues up to infinity. The challenge was to produce a surrogate which not only reproduced the expected output statistics of the simulation - i.e. closely approximate the simulator at low values - but also accurately reproduce the simulator outputs along the tail. These two considerations are often in tension with each other, and posed a significant challenge.

Despite the lockdown Alex submitted a report, the key finding of which was the decomposition of the Pareto input distribution into ‘peak’ and ‘tail’ components, followed by the creation of two separate surrogates which are used in tandem to approximate the full model. The necessity of this followed the selection of training data inputs: quadrature nodes, which tended to cluster at the borders of the lower and upper extremities of the input range. Significant improvements in the approximation of the simulator output across the support range were found, as well as in the approximation of output statistics.

Alex Hill in front of the Hartree Centre.

Alex then worked on the ‘Curse of Dimensionality’, which may sound like a discarded Harry Potter title but actually refers to the unfortunate fact that as the number of input variables of a simulation increases, the computational cost of approximating it increases more dramatically still. The aim was to optimise the selection of training inputs in higher dimensions, as well as to visualise and quantify uncertainty in the simulation and surrogate outputs. Alex made good progress in applying this to more complicated simulations for the client, as well as to epidemiological models for Covid-19, when the internship drew to a close. Alex said of the placement: “I fully enjoyed my time at IBM, and only wish that I could have experienced the full person to-person programme envisaged. Despite the challenges, it has changed the way I view approaches to coding, the potential for cross-subject collaboration, and the possibilities in working outside academia.”