No Cover Image

E-Thesis 512 views 654 downloads

Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding / ZHENG SUN

Swansea University Author: ZHENG SUN

Abstract

Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional meth...

Full description

Published: Swansea University, Wales, UK 2025
Institution: Swansea University
Degree level: Master of Research
Degree name: MRes
Supervisor: Wu, L. Y.
URI: https://cronfa.swan.ac.uk/Record/cronfa69219
Abstract: Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials.
Item Description: A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.
Keywords: Deep Learning, Protein Design, Diffusion Model
College: Faculty of Science and Engineering