E-Thesis 513 views 654 downloads
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding / ZHENG SUN
Swansea University Author: ZHENG SUN
Abstract
Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional meth...
| Published: |
Swansea University, Wales, UK
2025
|
|---|---|
| Institution: | Swansea University |
| Degree level: | Master of Research |
| Degree name: | MRes |
| Supervisor: | Wu, L. Y. |
| URI: | https://cronfa.swan.ac.uk/Record/cronfa69219 |
| first_indexed |
2025-04-03T12:55:40Z |
|---|---|
| last_indexed |
2025-04-04T05:16:25Z |
| id |
cronfa69219 |
| recordtype |
RisThesis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-04-03T13:55:39.1748915</datestamp><bib-version>v2</bib-version><id>69219</id><entry>2025-04-03</entry><title>Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding</title><swanseaauthors><author><sid>4f7a8ae443d8fc5c4095ddfabc27c526</sid><firstname>ZHENG</firstname><surname>SUN</surname><name>ZHENG SUN</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-04-03</date><abstract>Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea University, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Deep Learning, Protein Design, Diffusion Model</keywords><publishedDay>14</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-02-14</publishedDate><doi/><url/><notes>A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Wu, L. Y.</supervisor><degreelevel>Master of Research</degreelevel><degreename>MRes</degreename><apcterm/><funders/><projectreference/><lastEdited>2025-04-03T13:55:39.1748915</lastEdited><Created>2025-04-03T13:48:41.6702600</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>ZHENG</firstname><surname>SUN</surname><order>1</order></author></authors><documents><document><filename>69219__33938__6d5eaf1b943c4bc7a8b4b3f2047da3e1.pdf</filename><originalFilename>2024_Sun_Z.final.69219.pdf</originalFilename><uploaded>2025-04-03T13:54:50.6687572</uploaded><type>Output</type><contentLength>4548576</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Zheng Sun, 2024</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-04-03T13:55:39.1748915 v2 69219 2025-04-03 Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding 4f7a8ae443d8fc5c4095ddfabc27c526 ZHENG SUN ZHENG SUN true false 2025-04-03 Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials. E-Thesis Swansea University, Wales, UK Deep Learning, Protein Design, Diffusion Model 14 2 2025 2025-02-14 A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information. COLLEGE NANME COLLEGE CODE Swansea University Wu, L. Y. Master of Research MRes 2025-04-03T13:55:39.1748915 2025-04-03T13:48:41.6702600 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science ZHENG SUN 1 69219__33938__6d5eaf1b943c4bc7a8b4b3f2047da3e1.pdf 2024_Sun_Z.final.69219.pdf 2025-04-03T13:54:50.6687572 Output 4548576 application/pdf E-Thesis – open access true Copyright: The Author, Zheng Sun, 2024 true eng |
| title |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| spellingShingle |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding ZHENG SUN |
| title_short |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| title_full |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| title_fullStr |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| title_full_unstemmed |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| title_sort |
Generative AI in Protein Design: De novo Protein Design and Multi-Motif Scaffolding |
| author_id_str_mv |
4f7a8ae443d8fc5c4095ddfabc27c526 |
| author_id_fullname_str_mv |
4f7a8ae443d8fc5c4095ddfabc27c526_***_ZHENG SUN |
| author |
ZHENG SUN |
| author2 |
ZHENG SUN |
| format |
E-Thesis |
| publishDate |
2025 |
| institution |
Swansea University |
| college_str |
Faculty of Science and Engineering |
| hierarchytype |
|
| hierarchy_top_id |
facultyofscienceandengineering |
| hierarchy_top_title |
Faculty of Science and Engineering |
| hierarchy_parent_id |
facultyofscienceandengineering |
| hierarchy_parent_title |
Faculty of Science and Engineering |
| department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
Proteins are fundamental molecules of life, playing critical roles in nearly all biological processes. The ability to design proteins with specific structures and functions holds immense potential for applications in drug discovery, enzyme engineering, and synthetic biology. However, traditional methods such as rational design and directed evolution face significant limitations, including reliance on existing templates, low computational efficiency, and difficulties in handling complex multi-motif scaffolding tasks. Recent ad- vancements in generative artificial intelligence have opened new possibilities for protein design, yet challenges remain in geometric feature modeling and implicit motif position- ing. This study presents an integrated framework that combines the strengths of Geo- metric Vector Field Networks (VFN) and a protein diffusion model, MoDiff, to address these challenges. The VFN approach redefines frame modeling in de novo protein de- sign by leveraging vector computations between coordinates of frame-anchored virtual atoms, enabling superior performance in designability and diversity compared to tradi- tional methods. Specifically, VFN demonstrates significant improvements over Invariant Point Attention (IPA) in designability and diversity, and outperforms PiFold in sequence recovery rates during inverse folding. Meanwhile, MoDiff tackles the multi-motif scaffold- ing problem by facilitating the implicit positioning of motifs along the protein backbone, a capability absent in current scaffolding approaches. Our results show that MoDiff can generate diverse scaffolds and solve the multi-motif scaffolding problem even when motif positions are unknown, making it a versatile solution for complex protein design tasks. The synergy between VFN and MoDiff not only highlights the individual strengths of each model but also demonstrates their combined potential to revolutionize protein engineering. This work opens new avenues for designing complex proteins and scaffolds, significantly advancing the field of synthetic biology. The findings have broad implications for pharmaceutical and biotechnological applications, paving the way for the development of novel therapeutic proteins, enzymes, and biomaterials. |
| published_date |
2025-02-14T05:23:21Z |
| _version_ |
1851641142755459072 |
| score |
11.089718 |

