Overview
Deep Learning methods have had a huge recent impact on biology in recent years: for example, AlphaFold 2 and 3 (AF2/3) can predict the structure of most proteins with unprecedent accuracy. However, the limits of AF2/3 structures are increasingly evident, meaning an ongoing key role for experimental structure determination, especially X-ray crystallography which in 2024 still accounts for 60% of deposits. This project foresees the application of Deep Learning methods to improve the structure solution pipeline at distinct points.
About this opportunity
Deep Learning methods have had a huge recent impact on biology in recent years: for example, AlphaFold 2 and 3 (AF2/3) can predict the structure of most proteins with unprecedent accuracy. However, the limits of AF2/3 structures are increasingly evident, meaning an ongoing key role for experimental structure determination, especially X-ray crystallography which in 2024 still accounts for 60% of deposits. This project foresees the application of Deep Learning methods to improve the structure solution pipeline at distinct points.
Crystallography requires the target to crystallise yet sometimes, for poorly understood reasons, a target may not form the 3D lattice necessary. It is known that the present of conformational variability at the surface is an entropically negative factor for crystallisation. The student will explore the use of explicit 3D structure models from AF2/3 to predict crystallisation propensity: crucially, a collaborator has access to large amounts of crystallisation data, positive and negative, enabling an open-ended search for structural readouts from protein models that are associated with proteins that ultimately crystallise or those that don’t. The endeavour will encompass both consideration of flexible N- and C-termini and their potential truncation for construct design, as well as homologue scanning (using resources like the AlphaFold Database and FoldSeek Clusters) to identify members of a family with fewer problematic flexible surface loops. While 3D models allow for sophisticated spatial analysis, the student will also explore the use of Deep Learning-based inverse folding methods such as ProteinMPNN. Driven by the observation that sequences better fitting a given backbone than the native sequence can be discovered, it has been found (and we have seen; unpublished data) that improvements in stability, expression and activity can routinely be achieved by changing surface residues in particular (while retaining key functional determinants of course). Importantly, the student will be co-supervised by structural biology experts and will access high-performance robotic crystallisation facilities to test their predictions. Specific proteins to study will be chosen from those of interest at the time, but will likely include sulfation enzymes sulfotransferases.
While the availability of accurate AF2 models has enabled solution of the phase problem for most proteins by Molecular Replacement (MR), RNA-containing targets lag behind: structure predictions are still of comparatively poor quality and RNA structures have different principles of secondary structure formation and packing of such motifs. The student will therefore adapt our MR/cryo-EM map fitting software Slice’N’Dice to introduce bespoke RNA-specific processing.
Training: The student will access a range of training throughout the 4 years of the project. The student’s annually updated Developmental Needs Analysis form will form the basis of discussions around transferable skills with supervisors and independent assessors. Depending on the student’s background, taught modules in bioinformatics and programming will likely be required. Technical skills in software development will be acquired from post-docs in the research labs of the Primary Supervisor. This element hits BBSRC’s Developing Skills priorities in computation, data resources and statistics (https://www.ukri.org/what-we-do/developing-people-and-skills/bbsrc/developing-skills/ways-of-working/).The student will also benefit from integration into the CCP4 community. CCP4 sponsors practical courses in structure solution (eg https://www.diamond.ac.uk/Home/Events/2024/DLS-CCP4.html) providing broader structure solution training, as well as annual conferences (https://studyweekend.ccp4.ac.uk/). In the lab of Co-Supervisor 2, the computational skills acquired elsewhere will be complemented by training in experimental methods related to structural biology such as protein expression, purification and crystallisation.