No Cover Image

E-Thesis 513 views 654 downloads

Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding / ZHENG SUN

Swansea University Author: ZHENG SUN

Abstract

Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional meth...

Full description

Published: Swansea University, Wales, UK 2025
Institution: Swansea University
Degree level: Master of Research
Degree name: MRes
Supervisor: Wu, L. Y.
URI: https://cronfa.swan.ac.uk/Record/cronfa69219
first_indexed 2025-04-03T12:55:40Z
last_indexed 2025-04-04T05:16:25Z
id cronfa69219
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-04-03T13:55:39.1748915</datestamp><bib-version>v2</bib-version><id>69219</id><entry>2025-04-03</entry><title>Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding</title><swanseaauthors><author><sid>4f7a8ae443d8fc5c4095ddfabc27c526</sid><firstname>ZHENG</firstname><surname>SUN</surname><name>ZHENG SUN</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-04-03</date><abstract>Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with speci&#xFB01;c structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face signi&#xFB01;cant limitations, including reliance on existing templates, low computational e&#xFB03;ciency, and di&#xFB03;culties in handling complex multi-motif sca&#xFB00;olding tasks. Recent ad- vancements in generative arti&#xFB01;cial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein di&#xFB00;usion model, MoDi&#xFB00;, to address these challenges. The VFN approach rede&#xFB01;nes frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Speci&#xFB01;cally, VFN demonstrates signi&#xFB01;cant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDi&#xFB00; tackles the multi-motif sca&#xFB00;old- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current sca&#xFB00;olding approaches. Our results show that MoDi&#xFB00; can generate diverse sca&#xFB00;olds and solve the multi-motif sca&#xFB00;olding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDi&#xFB00; not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and sca&#xFB00;olds, signi&#xFB01;cantly advancing the &#xFB01;eld of synthetic biology. The &#xFB01;ndings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea University, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Deep Learning, Protein Design, Diffusion Model</keywords><publishedDay>14</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-02-14</publishedDate><doi/><url/><notes>A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Wu, L. Y.</supervisor><degreelevel>Master of Research</degreelevel><degreename>MRes</degreename><apcterm/><funders/><projectreference/><lastEdited>2025-04-03T13:55:39.1748915</lastEdited><Created>2025-04-03T13:48:41.6702600</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>ZHENG</firstname><surname>SUN</surname><order>1</order></author></authors><documents><document><filename>69219__33938__6d5eaf1b943c4bc7a8b4b3f2047da3e1.pdf</filename><originalFilename>2024_Sun_Z.final.69219.pdf</originalFilename><uploaded>2025-04-03T13:54:50.6687572</uploaded><type>Output</type><contentLength>4548576</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Zheng Sun, 2024</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2025-04-03T13:55:39.1748915 v2 69219 2025-04-03 Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding 4f7a8ae443d8fc5c4095ddfabc27c526 ZHENG SUN ZHENG SUN true false 2025-04-03 Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials. E-Thesis Swansea University, Wales, UK Deep Learning, Protein Design, Diffusion Model 14 2 2025 2025-02-14 A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information. COLLEGE NANME COLLEGE CODE Swansea University Wu, L. Y. Master of Research MRes 2025-04-03T13:55:39.1748915 2025-04-03T13:48:41.6702600 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science ZHENG SUN 1 69219__33938__6d5eaf1b943c4bc7a8b4b3f2047da3e1.pdf 2024_Sun_Z.final.69219.pdf 2025-04-03T13:54:50.6687572 Output 4548576 application/pdf E-Thesis – open access true Copyright: The Author, Zheng Sun, 2024 true eng
title Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
spellingShingle Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
ZHENG SUN
title_short Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
title_full Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
title_fullStr Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
title_full_unstemmed Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
title_sort Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding
author_id_str_mv 4f7a8ae443d8fc5c4095ddfabc27c526
author_id_fullname_str_mv 4f7a8ae443d8fc5c4095ddfabc27c526_***_ZHENG SUN
author ZHENG SUN
author2 ZHENG SUN
format E-Thesis
publishDate 2025
institution Swansea University
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials.
published_date 2025-02-14T05:23:21Z
_version_ 1851641142755459072
score 11.089718