Multi-Fidelity AI for Accelerated Chemical Discovery (ICASE studentship with IBM Research)

Description

This PhD project is part of the CDT in Distributed Algorithms: The What, How and where of Next-Generation Data Science.

The University of Liverpool’s Centre for Doctoral Training in Distributed Algorithms (CDT) is working in partnership with the STFC Hartree Centre and 20+ external partners from the manufacturing, defence and security sectors to provide a 4-year innovative PhD training programme that will equip up to 60 students with: the essential skills needed to become future leaders in distributed algorithms; the technical and professional networks needed to launch a career in next generation data science and future computing; and the confidence to make a positive difference in society, the economy and beyond.

This studentship is open to UK/Home Students only.

The last two decades have seen the emergence of the Fourth Paradigm of big-data-driven science, dominated by an exa-flood of data and the associated systems and analytics to process it. The Fourth Paradigm has definitively made science a big-data problem. With the maturation of AI and robotic technology, aided by HPC and cloud, we are entering a new paradigm where the key is not a single technology, but rather the fusion of heterogeneous capabilities to work together to achieve results greater than the sum of their parts. As the number of modalities through which data can be collected rises, the challenge of relating these multiple modalities becomes increasingly challenging. In the context of materials discovery, these modalities might be different experiments in the lab, data extracted from the literature, or different simulation protocols run on HPC systems. The relationships between these modalities is complex, and mostly unknown, although it can sometimes be informed by knowledge of the underlying chemistry. We are offering a fully funded PhD, part-funded by IBM, whose aim is to address the emerging question of how information from these modalities is fused for automated scientific decision making, through the adaptation and creation of distributed AI algorithms.

This PhD relates to developing new AI algorithms and tools for this purpose, specifically focussed on the application of Bayesian machine learning models such as Gaussian processes and Bayesian neural networks to fuse information gathered from two different types of source – those data collected from automated experiments in the laboratory, and those collected from computer-driven simulations. The approach adopted relies on the use of Multi-Output Bayesian methods, which link a set of descriptors (here, some featurisation of a chemical structure) and a set of outcomes (here, the physical of chemical property of a molecule either measured or calculated). In this paradigm, both the correlation between outcomes, and the biases between them, can be learned from the data. When applied to predicting the physical or chemical properties from experiments and simulation, this will enable information gained in one particular manner (e.g., simulations) to be fused with other, related information (e.g., from experiments) and drive automated decision making around optimal design of discovery approaches (e.g., what should I make, and how should I measure it?).

To support this naturally cross-discipline project, we will draw support from two world-leading research groups. The first part of the project will leverage the strong expertise in the Maskell group around distributed algorithms, specifically Sequential Monte Carlo. The project will include training to use advanced probabilistic programming languages such as Stan to build and run SMC samplers, and then advance these algorithms for use in building AI models; e.g., multi-output Gaussian processes and Bayesian neural networks. The second part of this project will leverage expertise in AI accelerated materials discovery in the Cooper group in the Materials Innovation Factory. The student will leverage the local domain knowledge to build and deploy the algorithms built previously in the context of organic, solid-state materials for photocatalytic water splitting. This two-phase approach will not happen purely sequentially as we envision learnings from one phase will feed back into the other.

The project will be supervised by the leads for these two groups, Prof. Andy Cooper and Prof. Simon Maskell and is an EPSRC ICASE award co-funded by IBM, who have an ongoing strategic collaboration with the University of Liverpool that centres on these two groups. The IBM co-supervisor, Ed Pyzer-Knapp, is IBM’s Global Lead for AI-Enriched Modelling and Simulation, and has worked closely with both Professors in recent years such that this PhD is the culmination of significant collaboration between IBM and Liverpool to-date. It is anticipated that the PhD will include a 3–6 month secondment to IBM’s Daresbury site (located approximately 40 minutes by car from the University of Liverpool). The project and student will be aligned to the EPSRC-funded Centre for Doctoral Training in Distributed Algorithms, which already supports several PhD students working on neighbouring topic areas and provides cohort-based research environment that offers technical and professional training pertinent to the PhD.

The project will require a strong background in one of Chemistry and AI/ML with both experience of the other discipline and a desire to learn about both.

Students will be based at the University of Liverpool and will be part of the CDT and Signal Processing  research community - a large, social and creative research group that works together solving tough research problems. Students have two academic supervisors and an industrial partner who provide co-supervision, placements and the opportunity to work on real world challenges. In addition, students attend technical and professional training to gain unparalleled expertise to make a difference now and in the future.

The CDT is committed to providing an inclusive environment in which diverse students can thrive. The CDT particularly encourages applications from women, disabled and Black, Asian and Minority Ethnic candidates, who are currently under-represented in the sector. We can also consider part time PhD students. We also encourage talented individuals from various backgrounds, with either an UG or MSc in a numerate subject and people with ambition and an interest in making a difference. 

For enquiries please contact: Professor Andy Cooper () Professor Simon Maskell () in the first instance or visit: www.liverpool.ac.uk/distributed-algorithms-cdt for Director, Student Ambassador and Centre Manager details.