Overview
The volume and diversity of open access data related to metabolism in humans and other organisms is rapidly increasing and opens new avenues of impactful science. To provide the greatest impact using the thousands of studies openly published, standardisation of metabolite names and standardisation of computational tools applied to derive biological knowledge from data are required. This PhD will provide opportunities to drive forward research to impact your career but also metabolism researchers globally.
About this opportunity
Scientific rationale and excellence
Metabolites play many important roles in biological systems and their study using the scientific research tool metabolomics is widely applied across the biosciences to investigate metabolism, its dynamics and the molecular phenotype of different organisms. Impactful application areas include biotechnology, agriculture and mammalian health across the life course. In the last five years the availability of data in metabolomics data repositories including the BBSRC-funded MetaboLights has started to grow exponentially, especially for untargeted studies where the metabolites studied are not known prior to data collection and instead metabolite identification is required using the acquired data. The reuse of data following FAIR principles is limited due to a number of issues in data deposition and data descriptors. One core area that requires significant developments to allow data reuse within and between deposited studies is the need to structurally identify metabolites and harmonise metabolite ontologies across data repositories. Currently, the reporting of metabolite names and unique identifiers is not harmonised across the research community and the methods and resources applied to identify metabolites are varied and provide different levels of accuracy. However, all the data required to identify metabolites and report these in a harmonised way are available. We have the potential to be able to integrate data across studies to enhance statistical robustness and power which will allow deposited data to be reused for greater impact. However, tools to allow harmonisation and reuse of data across studies are not currently available. The project will develop new computational tools and resources to be shared and applied by the metabolomics and biological sciences research community. This will allow reuse of metabolomics datasets which are publicly available within the biological research environment.
Objectives
To develop and validate new computational tools for metabolite identification and inter-study data integration and apply these tools to enhance the reusability of open access metabolomics data.
Methods
The PhD student will develop their computational skills as a bioinformatician during the PhD programme developing, validating and applying R, Python and Galaxy along with developing new machine learning/AI functions. These will be applied to develop and apply computational workflows related to (a) construction and application of an in-source fragmentation and adduct library for use of MS1 data for metabolite identification, (b) construction of organism- and sample specific metabolome libraries to enhance identification confidence and (c) development of false discovery rate models for use of MS1, retention time and MS/MS data for metabolite identification. Potential outcomes: (1) Open source programmes will be developed and applied to enhance interpretation and integration of metabolomics data available in data repositories and (2) Developing new machine learning/AI functions which will be accessible to the research community.