Overview
This project aims to develop a revolutionary AI-powered, "context-aware" pipeline to automate metabolite annotation in untargeted metabolomics data. By pioneering LLMs and Bayesian statistics, this project will transform complex metabolomics data into biological breakthroughs.
About this opportunity
Metabolite annotation is one of the most pressing challenges in untargeted metabolomics data analysis. Current annotation tools often rely on simple mass-matching against static databases, leading to high false-positive rates. This project builds upon the Integrated Probabilistic Annotation (IPA) framework (Del Carratore et al., 2019; Del Carratore et al., 2023) to move beyond simple matching toward a “context-aware” system.
The PhD candidate will lead three key objectives:
- AI-Driven Database Curation: You will utilize Large Language Models (LLMs) to mine scientific literature and existing repositories to create a “context-aware” database. This database will encode vital metadata like retention times and biological likelihood to filter out false positives.
- Platform Expansion: You will extend the computational framework to integrate data from emerging analytical platforms beyond LC-MS, including GC-MS, Ion Mobility-MS, and MALDI.
- Software Engineering & GUI: To ensure global community adoption, you will develop a user-friendly Graphical User Interface (GUI), empowering non-bioinformaticians to utilize these advanced probabilistic methods.
Training and Collaboration
You will be embedded in Dr Del Carratore Lab focusing on Bioinformatics and Computational Biology. Moreover, you will closely collaborate with two a world-class research facility at the University of Liverpool, benefiting from a unique dual academic setting:
- Computational Biology Facility (CBF): You will work within the CBF to develop high-quality code and AI models, gaining expertise in software engineering and LLM implementation.
- Centre for Metabolomics Research (CMR): You will have direct access to data coming from state-of-the-art analytical platforms to generate and validate experimental data. Prof. Warwick Dunn will provide mentorship on analytical chemistry aspects and user requirements.
Project Structure
The 3.5-year PhD is designed to transition you from a trainee to an independent leader in computational biology:
- Year 1: Foundation and Advanced Training. Your first year focuses on mastering the computational skillsets required for the project, including bioinformatics, Bayesian statistics, and AI/LLM implementation. You will begin the initial development of the “context-aware” database by mining existing repositories.
- Years 2-3: Implementation and Engagement. During this period, you will move into independent research, expanding the IPA framework to new analytical technologies like Ion Mobility-MS. You will also lead the development of the software GUI and present your findings at major international conferences, such as the annual meeting of the Metabolomics Society.
- Final Phase: Thesis and Independent Research. The final six months are dedicated to completing your independent research, finalizing the open-source software for community release on GitHub, and writing your doctoral thesis