No Cover Image

Journal article 671 views 31 downloads

Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis

Gabriel W. Hassler Orcid Logo, Brigida Gallone, Leandro Aristide, William Allen Orcid Logo, Max R. Tolkoff, Andrew J. Holbrook, Guy Baele, Philippe Lemey, Marc A. Suchard

Methods in Ecology and Evolution, Volume: 13, Issue: 10, Pages: 2181 - 2197

Swansea University Author: William Allen Orcid Logo

Abstract

Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic comparative methods seek to disentangle these relationships across the evolutionary history of a group of organisms. Unfortuna...

Full description

Published in: Methods in Ecology and Evolution
ISSN: 2041-210X 2041-210X
Published: Wiley 2022
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa60450
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2022-07-11T14:17:24Z
last_indexed 2023-01-13T19:20:35Z
id cronfa60450
recordtype SURis
fullrecord <?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>60450</id><entry>2022-07-11</entry><title>Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis</title><swanseaauthors><author><sid>d6f01dd06d25fa8804daad86e251b8a5</sid><ORCID>0000-0003-2654-0438</ORCID><firstname>William</firstname><surname>Allen</surname><name>William Allen</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-07-11</date><deptcode>BGPS</deptcode><abstract>Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic comparative methods seek to disentangle these relationships across the evolutionary history of a group of organisms. Unfortunately, most existing methods fail to accommodate high-dimensional data with dozens or even thousands of observations per taxon. Phylogenetic factor analysis offers a solution to the challenge of dimensionality. However, scientists seeking to employ this modeling framework confront numerous modeling and implementation decisions, the details of which pose computational and replicability challenges.We develop new inference techniques that increase both the computational efficiency and modeling flexibility of phylogenetic factor analysis. To facilitate adoption of these new methods, we present a practical analysis plan that guides researchers through the web of complex modeling decisions. We codify this analysis plan in an automated pipeline that distills the potentially overwhelming array of decisions into a small handful of (typically binary) choices.We demonstrate the utility of these methods and analysis plan in four real-world problems of varying scales. Specifically, we study floral phenotype and pollination in columbines, domestication in industrial yeast, life history in mammals, and brain morphology in New World monkeys.General and impactful community employment of these methods requires a data scientific analysis plan that balances flexibility, speed and ease of use, while minimizing model and algorithm tuning. Even in the presence of non-trivial phylogenetic model constraints, we show that one may analytically address latent factor uncertainty in a way that (a) aids model flexibility, (b) accelerates computation (by as much as 500-fold) and (c) decreases required tuning. These efforts coalesce to create an accessible Bayesian approach to high-dimensional phylogenetic comparative methods on large trees.</abstract><type>Journal Article</type><journal>Methods in Ecology and Evolution</journal><volume>13</volume><journalNumber>10</journalNumber><paginationStart>2181</paginationStart><paginationEnd>2197</paginationEnd><publisher>Wiley</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2041-210X</issnPrint><issnElectronic>2041-210X</issnElectronic><keywords>Bayesian inference, BEAST, latent factor model, Geodesic Hamiltonian MonteCarlo, phylogenetic comparative methods, Stiefel manifold</keywords><publishedDay>1</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-08-01</publishedDate><doi>10.1111/2041-210x.13920</doi><url/><notes/><college>COLLEGE NANME</college><department>Biosciences Geography and Physics School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>BGPS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>Research Foundation - Flanders. Grant Numbers: G098321N, G0E1420N, G051322N; H2020 European Research Council. Grant Number: 725422-ReservoirDOCS; Internal Funds KU Leuven. Grant Number: C14/18/094; National Human Genome Research Institute. Grant Number: T32HG002536; National Institute of Allergy and Infectious Diseases. Grant Numbers: F31AI154824, K25AI153816, R01AI153044; National Science Foundation. Grant Number: DMS 2152774; Wellcome Trust. Grant Number: 206298/Z/17/Z; MCIN/AEI/10.13039/501100011033. Grant Number: FJC2019-042184-I</funders><projectreference/><lastEdited>2024-07-10T13:28:21.7171375</lastEdited><Created>2022-07-11T15:08:53.8401721</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Biosciences, Geography and Physics - Biosciences</level></path><authors><author><firstname>Gabriel W.</firstname><surname>Hassler</surname><orcid>0000-0001-6951-5254</orcid><order>1</order></author><author><firstname>Brigida</firstname><surname>Gallone</surname><order>2</order></author><author><firstname>Leandro</firstname><surname>Aristide</surname><order>3</order></author><author><firstname>William</firstname><surname>Allen</surname><orcid>0000-0003-2654-0438</orcid><order>4</order></author><author><firstname>Max R.</firstname><surname>Tolkoff</surname><order>5</order></author><author><firstname>Andrew J.</firstname><surname>Holbrook</surname><order>6</order></author><author><firstname>Guy</firstname><surname>Baele</surname><order>7</order></author><author><firstname>Philippe</firstname><surname>Lemey</surname><order>8</order></author><author><firstname>Marc A.</firstname><surname>Suchard</surname><order>9</order></author></authors><documents><document><filename>60450__24752__200c229e61dc4e1ab577684cd99aea11.pdf</filename><originalFilename>60450.pdf</originalFilename><uploaded>2022-07-26T14:14:23.6289790</uploaded><type>Output</type><contentLength>5151027</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2023-06-19T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling v2 60450 2022-07-11 Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis d6f01dd06d25fa8804daad86e251b8a5 0000-0003-2654-0438 William Allen William Allen true false 2022-07-11 BGPS Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic comparative methods seek to disentangle these relationships across the evolutionary history of a group of organisms. Unfortunately, most existing methods fail to accommodate high-dimensional data with dozens or even thousands of observations per taxon. Phylogenetic factor analysis offers a solution to the challenge of dimensionality. However, scientists seeking to employ this modeling framework confront numerous modeling and implementation decisions, the details of which pose computational and replicability challenges.We develop new inference techniques that increase both the computational efficiency and modeling flexibility of phylogenetic factor analysis. To facilitate adoption of these new methods, we present a practical analysis plan that guides researchers through the web of complex modeling decisions. We codify this analysis plan in an automated pipeline that distills the potentially overwhelming array of decisions into a small handful of (typically binary) choices.We demonstrate the utility of these methods and analysis plan in four real-world problems of varying scales. Specifically, we study floral phenotype and pollination in columbines, domestication in industrial yeast, life history in mammals, and brain morphology in New World monkeys.General and impactful community employment of these methods requires a data scientific analysis plan that balances flexibility, speed and ease of use, while minimizing model and algorithm tuning. Even in the presence of non-trivial phylogenetic model constraints, we show that one may analytically address latent factor uncertainty in a way that (a) aids model flexibility, (b) accelerates computation (by as much as 500-fold) and (c) decreases required tuning. These efforts coalesce to create an accessible Bayesian approach to high-dimensional phylogenetic comparative methods on large trees. Journal Article Methods in Ecology and Evolution 13 10 2181 2197 Wiley 2041-210X 2041-210X Bayesian inference, BEAST, latent factor model, Geodesic Hamiltonian MonteCarlo, phylogenetic comparative methods, Stiefel manifold 1 8 2022 2022-08-01 10.1111/2041-210x.13920 COLLEGE NANME Biosciences Geography and Physics School COLLEGE CODE BGPS Swansea University Research Foundation - Flanders. Grant Numbers: G098321N, G0E1420N, G051322N; H2020 European Research Council. Grant Number: 725422-ReservoirDOCS; Internal Funds KU Leuven. Grant Number: C14/18/094; National Human Genome Research Institute. Grant Number: T32HG002536; National Institute of Allergy and Infectious Diseases. Grant Numbers: F31AI154824, K25AI153816, R01AI153044; National Science Foundation. Grant Number: DMS 2152774; Wellcome Trust. Grant Number: 206298/Z/17/Z; MCIN/AEI/10.13039/501100011033. Grant Number: FJC2019-042184-I 2024-07-10T13:28:21.7171375 2022-07-11T15:08:53.8401721 Faculty of Science and Engineering School of Biosciences, Geography and Physics - Biosciences Gabriel W. Hassler 0000-0001-6951-5254 1 Brigida Gallone 2 Leandro Aristide 3 William Allen 0000-0003-2654-0438 4 Max R. Tolkoff 5 Andrew J. Holbrook 6 Guy Baele 7 Philippe Lemey 8 Marc A. Suchard 9 60450__24752__200c229e61dc4e1ab577684cd99aea11.pdf 60450.pdf 2022-07-26T14:14:23.6289790 Output 5151027 application/pdf Accepted Manuscript true 2023-06-19T00:00:00.0000000 true eng
title Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
spellingShingle Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
William Allen
title_short Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
title_full Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
title_fullStr Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
title_full_unstemmed Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
title_sort Principled, practical, flexible, fast: A new approach to phylogenetic factor analysis
author_id_str_mv d6f01dd06d25fa8804daad86e251b8a5
author_id_fullname_str_mv d6f01dd06d25fa8804daad86e251b8a5_***_William Allen
author William Allen
author2 Gabriel W. Hassler
Brigida Gallone
Leandro Aristide
William Allen
Max R. Tolkoff
Andrew J. Holbrook
Guy Baele
Philippe Lemey
Marc A. Suchard
format Journal article
container_title Methods in Ecology and Evolution
container_volume 13
container_issue 10
container_start_page 2181
publishDate 2022
institution Swansea University
issn 2041-210X
2041-210X
doi_str_mv 10.1111/2041-210x.13920
publisher Wiley
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Biosciences, Geography and Physics - Biosciences{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Biosciences, Geography and Physics - Biosciences
document_store_str 1
active_str 0
description Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic comparative methods seek to disentangle these relationships across the evolutionary history of a group of organisms. Unfortunately, most existing methods fail to accommodate high-dimensional data with dozens or even thousands of observations per taxon. Phylogenetic factor analysis offers a solution to the challenge of dimensionality. However, scientists seeking to employ this modeling framework confront numerous modeling and implementation decisions, the details of which pose computational and replicability challenges.We develop new inference techniques that increase both the computational efficiency and modeling flexibility of phylogenetic factor analysis. To facilitate adoption of these new methods, we present a practical analysis plan that guides researchers through the web of complex modeling decisions. We codify this analysis plan in an automated pipeline that distills the potentially overwhelming array of decisions into a small handful of (typically binary) choices.We demonstrate the utility of these methods and analysis plan in four real-world problems of varying scales. Specifically, we study floral phenotype and pollination in columbines, domestication in industrial yeast, life history in mammals, and brain morphology in New World monkeys.General and impactful community employment of these methods requires a data scientific analysis plan that balances flexibility, speed and ease of use, while minimizing model and algorithm tuning. Even in the presence of non-trivial phylogenetic model constraints, we show that one may analytically address latent factor uncertainty in a way that (a) aids model flexibility, (b) accelerates computation (by as much as 500-fold) and (c) decreases required tuning. These efforts coalesce to create an accessible Bayesian approach to high-dimensional phylogenetic comparative methods on large trees.
published_date 2022-08-01T13:28:20Z
_version_ 1804195070713266176
score 11.037056