Engineering - AI-Driven Extraction of Key Information from Scientific Literature
Supervisor: Dr Xue Yong
Bio: Dr Xue Yong is a Lecturer and Leverhulme Early Career Fellow at the University of Liverpool. Her research focuses on the computational design of functional materials for sustainable energy applications, including catalyst development, CO? utilisation, and liquid-fuel modelling. She combines computational chemistry, machine learning, and catalytic science to accelerate the discovery of new materials and reaction pathways. Dr Yong works across theory, simulation, and data-driven methods, and has extensive experience supervising undergraduate and postgraduate research projects. She previously held an NSERC Postdoctoral Fellowship in Canada and has led or co-led eight major research projects, including the EPSRC HPC Access programme, EPSRC Hub Flexible Grant, Fusion CDT, and Leverhulme Trust Fellowship.
Email: xue.yong@liverpool.ac.uk
School: Engineering
Department: Engineering
Module code: ENGG290
Suitable for students of: Chemistry, Chemical Engineering, Materials Science, Computer Science, Data Science / AI, Electrical or Electronic Engineering
Desired experience or requirements: There are no mandatory requirements. Curiosity, willingness to learn, and basic computer literacy are enough. All technical skills will be taught during the project.
Places available: 2
Start dates: Session 1 (15th June 2026)
Project length: 8 weeks
Virtual option: Yes
Hybrid option: Yes
Project description:
Scientific literature is growing at an extraordinary pace, and important experimental details—such as reaction conditions, reactor designs, conversion efficiencies, and performance metrics—are often buried within millions of PDF articles. Extracting this information manually is slow, inconsistent, and highly error-prone, creating a major bottleneck for modern data-driven research. Reliable, structured datasets are essential for machine learning and for developing predictive models that can reveal trends, optimise conditions, and guide the design of new catalytic systems. Without accurate data extracted from the literature, it is impossible to fully exploit machine learning to accelerate scientific discovery. Automating this process therefore serves both efficiency and deeper scientific understanding.
This project aims to create a small automated workflow that uses Artificial Intelligence to read scientific articles and extract structured information, reducing reliance on manual summarisation. The student will work with a well-defined scientific case study: plasma-assisted ammonia (NH?) decomposition for hydrogen production. This reaction is simple, widely reported, and uses consistent experimental descriptors (reactor type, power input, gas flow rate, NH? conversion, H? production rate, and energy efficiency), making it an ideal testbed for developing and testing automated extraction tools.
Using n8n, an open-source automation platform that connects digital tools with minimal coding, the student will design a workflow that:
Retrieves scientific articles or summaries using standard APIs or institutional access.
Sends text to a Large Language Model (LLM)—similar to ChatGPT—for automated extraction of key experimental fields.Organises the extracted information into a structured spreadsheet or database.
Runs with minimal human input, generating a dataset suitable for immediate analysis.
Once the automated workflow is operational, the student will perform basic data analysis in Python, such as identifying common reactor designs, typical operating conditions, or factors correlated with higher hydrogen output. These tasks provide a gentle introduction to data analysis while producing clear visualisations for the student’s poster presentation at the end of the programme.
This project is designed to be accessible to undergraduate students from all academic backgrounds. Curiosity and willingness to learn are more important than prior specialist knowledge; full guidance will be provided on workflow design, prompt creation, and data interpretation. The student will gain practical experience with AI tools, automation platforms, literature searching, and scientific data analysis—skills that are increasingly valuable across chemistry, engineering, and data science.
By the end of the placement, the student will have produced a functional prototype that demonstrates how AI can accelerate scientific discovery by transforming unstructured literature into high-quality datasets. The system can later be expanded to other scientific topics, illustrating a scalable approach to improving access to scientific knowledge and enabling future machine-learning-driven research.
Additional requirements: N/A