Overview
Deep Learning methods have had a huge recent impact on biology in recent years: for example, AlphaFold 2 and 3 (AF2/3) can predict the structure of most proteins with unprecedent accuracy. However, the limitations of AF2/3 structures are increasingly evident, meaning an ongoing role for experimental structure determination, especially X-ray crystallography which currently accounts for ~60% of deposits. The student will Deep Learning methods to improve the structure solution pipeline at distinct points.
About this opportunity
Crystallography requires the target to crystallise yet a target may not form the 3D lattice necessary. It is known that conformational variability at the surface is an entropically negative factor for crystallisation. The student will explore the use of explicit 3D structure models from AF2/3 to predict crystallisation propensity: crucially, a collaborator has access to large amounts of crystallisation data, positive and negative. The endeavour will encompass both consideration of truncation of flexible N- and C-termini as well as homologue scanning to identify family members with fewer problematic flexible surface loops. The student will also explore the use of Deep Learning-based inverse folding methods such as ProteinMPNN. Importantly, the student will be co-supervised by structural biology experts and will access high-performance robotic crystallisation facilities to test their predictions. Specific proteins to study will be chosen from those of interest at the time but will likely include sulfation enzymes sulfotransferases.
Since modern synchrotrons can collect diffraction data at an astonishing rate, it is advantageous to learn as much as possible about the composition of a crystal from unphased diffraction data. An important proof of principle has been demonstrated recently by co-supervisors Dr Ronan Keegan and Dr David McDonagh: a Machine Learning (ML) approach applied to Patterson maps enables more accurate prediction of solvent content (doi 10.1101/2025.09.24.678396). This has important consequences for the speed and carbon intensity of subsequent Molecular Replacement (MR) efforts. The student will develop these ML methods further to, for example, improve methods in multi-crystal experiments and detect ligand binding.
Finally, While the availability of accurate AF2 models has enabled solution of the phase problem for most proteins by Molecular Replacement (MR), RNA-containing targets lag behind: structure predictions are still of comparatively poor quality and RNA structures have different principles of secondary structure formation and packing of such motifs. The student will therefore adapt our MR/cryo-EM map fitting software Slice’N’Dice to introduce bespoke RNA-specific processing.
Training
The student will access a range of training throughout the 4 years of the project. Taught modules in bioinformatics and programming may be appropriate while technical skills in software development will be acquired from post-docs in the friendly and supportive Rigden lab. Through time spent at the ALC with David McDonagh, the student will benefit from a variety of ML courses, such as NVIDIA courses in deep learning. The student will also benefit from integration into the CCP4 community. Finally, in the lab of Dr Igor Barsukov, the computational skills acquired elsewhere will be complemented by training in experimental methods related to structural biology.
Further reading
1. Agirre, J., Atanasova, M., Bagdonas, H., Ballard, C. B., Baslé, A., Beilsten-Edmands, J., … Keegan, R. M. … Rigden, D. J. … & Yamashita, K. (2023). The CCP4 suite: integrative software for macromolecular crystallography. Acta Crystallographica Section D: Structural Biology, 79(6), 449-461.
2. Simpkin, A. J., Elliott, L. G., Joseph, A. P., Burnley, T., Stevenson, K., Sanchez Rodriguez, F., Fando, M., Krissinel, E., McNicholas, S., Rigden, D. J., & Keegan, R. M. (2025). Slice’N’Dice: Maximising the value of predicted models for structural biologists. Acta Crystallographica Section D: Structural Biology, 81(3), 105-121
3. Das, R., Kretsch, R. C., Simpkin, A. J., Mulvaney, T., Pham, P., Rangan, R., … Keegan, R. M. … Rigden, D. J. … & Westhof, E. (2023). Assessment of three‐dimensional RNA structure prediction in CASP15. Proteins: Structure, Function, and Bioinformatics, 91(12), 1747-1770.
4. Mistry, R., Byrne, D. P., Starns, D., Barsukov, I. L., Yates, E. A., & Fernig, D. G. (2024). Polysaccharide sulfotransferases: the identification of putative sequences and respective functional characterisation. Essays in Biochemistry, EBC20230094.
5. McDonagh, D., Skylaris, C. K., & Day, G. M. (2019). Machine-learned fragment-based energies for crystal structure prediction. Journal of chemical theory and computation, 15(4), 2743-2758.