Apropos of last week's post on models and data, here is a preprint study on the amount of morphological data available for living mammal taxa. The focus was on coverage in each order, where coverage was defined by the fraction of OTUs with available data, where available seems to be defined as appearing in a character matrix with more than 100 characters (which doesn't necessarily mean that >100 characters were scored for each OTU). I can't judge whether the three databases and the literature search constitute a sufficient search, but their results aren't implausible. Using their appearance in a sufficiently character-rich matrix, they counted 1074 mammalian OTUs in 126 matrices. Without the matrix size filtering, there were 5228 OTUs in 286 matrices. This is apparently less than the morphology data for fossil mammal taxa.
I expect there are very few character matrices for any group with more than 100 behavioral characters. This is probably a combination of the difficulty of extracting a 100 behavioral characters in a study as well as the rarity of this sort of comparative/phylogenetic analysis in behavior.
Thanks for to everyone who responded to last week's post.
Friday, July 24, 2015
Wednesday, July 15, 2015
Models vs. Data - not a choice for behavior
These are some thoughts that have been rattling in my head since the SSB 2015 standalone meeting back in May. They are somewhat depressing, so I'm open to suggestions of more positive ways to look at the situation in comparative studies of behavior.
The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case). The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward. Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.
The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees. This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.
The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods. The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab. The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary. I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014; doi:10.1093/sysbio/syu070) paper. The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.
Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains. In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site). Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes. Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales. Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth. Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models. Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters? Not necessarily, but they are definitely a start. As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic. However, you aren't limited in your ancestral reconstructions to using contemporary data. This won't solve these problems, but they might give you confidence in your analysis. There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.
Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits. Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory. Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; http://dx.doi.org/10.1016/j.anbehav.2014.09.025) Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes. So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.
I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data. Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies. There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.
As challenging as the data situation is, I worry that the modeling side is in complete disarray. Models of the evolution of behavior are frequently descriptive and of little use for inference. For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade. Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.
If there is work being done here, it is either well ahead of its time or not being recognized for what it is. Please prove me wrong on this. Meanwhile, I can only hope that the time for this theory to model link is not too far off.
Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth. These won't solve the model problem, but they may yield up data that could support some development and testing. The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils. I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved.
The second is the study of cultural change, both human and animal. There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.
I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk. The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed. I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods. I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.
The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case). The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward. Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.
The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees. This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.
The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods. The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab. The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary. I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014; doi:10.1093/sysbio/syu070) paper. The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.
Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains. In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site). Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes. Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales. Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth. Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models. Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters? Not necessarily, but they are definitely a start. As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic. However, you aren't limited in your ancestral reconstructions to using contemporary data. This won't solve these problems, but they might give you confidence in your analysis. There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.
Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits. Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory. Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; http://dx.doi.org/10.1016/j.anbehav.2014.09.025) Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes. So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.
I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data. Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies. There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.
As challenging as the data situation is, I worry that the modeling side is in complete disarray. Models of the evolution of behavior are frequently descriptive and of little use for inference. For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade. Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.
If there is work being done here, it is either well ahead of its time or not being recognized for what it is. Please prove me wrong on this. Meanwhile, I can only hope that the time for this theory to model link is not too far off.
Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth. These won't solve the model problem, but they may yield up data that could support some development and testing. The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils. I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved.
The second is the study of cultural change, both human and animal. There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.
I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk. The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed. I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods. I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.
Labels:
Behavior,
Comparative Method,
Evolution,
Meetings,
Ontologies
Thursday, April 16, 2015
Summer meetings 2015
In recent years, the main thing I've been using this blog for is letting people know about meetings I'm attending and from time to time, meeting reports of varying quality. So here are the meetings I'll be attending during Summer 2015:
Society for Systematic Biologists standalone meeting (Ann Arbor MI May 20-22) Not planning to present here, but will be around for face-to-face with other OpenTree of Life people and anyone else who wants to discuss or simply catch up.
Animal Behavior Society (Anchorage AK June 10-14). I'm coming up to Anchorage a couple of days early, and currently have the 9th completely open. No sudden exits this year I promise. I have signed up to give a talk about Arachnolingua and I promise I'll keep the focus on the spiders (ontologies and reasoning will stay in the background).
American Arachnological Society (Mitchell SD June 19-23). Not sure whether I'll do a poster here. The plan is mostly to play sponge and catch up with some people.
I'm currently focussing on contributing to the OpenTree taxonomy effort (with Jonathan Rees) and reworking the Arachnolingua tools. Nothing to report on the next Behavior Ontologies workshop at this time.
I'm currently focussing on contributing to the OpenTree taxonomy effort (with Jonathan Rees) and reworking the Arachnolingua tools. Nothing to report on the next Behavior Ontologies workshop at this time.
Saturday, August 9, 2014
Finished up a Behavioral Ontology workshop, now at ABS2014
We finished up the first of our two workshops to followup the one-day session we held in conjunction with the 2013 phenotype RCN summit. We gathered together a fairly diverse group of 15 behavior people at Princeton for a day and a half prior to the ABS 2014 meeting. Our task was to compare the ABO (the ontology constructed over the course of two workshops in 2004 and 2005 at Cornell) and the NBO (the ontology for behavior processes and phenotypes developed within the OBO framework). Despite some initial fears, it looks like we have a good chance of coming up with a proposed integration of these that will allow behavioral ecologists to make use of the NBO while not breaking things for the current users, mostly model organism genetics and phenotype investigators.
Thanks to all who attended, my three co-organizers (Anne Clark, Sue Margulis, Cyndy Parr) and also to George Gkoutos, developer and maintainer of the NBO, who listened through most of our friday session and took an hour to host a question and answer session over skype.
Tuesday, August 5, 2014
I'll be at ABS2014 next week.
I’ll be presenting a poster at the Animal Behavior Society meeting
next week in Princeton New Jersey. The poster is NE 116. If you aren’t
attending, the poster and a supporting script file are now up on figshare.
The poster looks at how well NBO serves as a vocabulary for spider
behavior. It turned out to do a little better than I expected, and I
omitted a couple of statistically non-significant tests I tried over the
weekend (looking at term depth across NBO vs depth of NBO terms used in
40 arachnolingua claims).
Note: there is a small chance I may not be able to attend the poster on Tuesday evening. I’ll update this if it turns out I won’t be there.
Note: there is a small chance I may not be able to attend the poster on Tuesday evening. I’ll update this if it turns out I won’t be there.
Monday, May 19, 2014
This summer - 2014
I'll be attending Evolution/iEvoBio 2014 in Raleigh in June. I am scheduled for another iEvoBio lightning talk on Arachnolingua, my semantic database of behavior of spiders and other arachnids. I expect the talk will bring in some things I've learned while working with OpenTreeofLife. I'm still blogging the development of Arachnolingua and its associated tools and web presence at the development blog.
At Animal Behavior 2014 I'll be presenting a poster that focuses more on the behavior (and the spiders).
At Animal Behavior 2014 I'll be presenting a poster that focuses more on the behavior (and the spiders).
Saturday, August 17, 2013
ABS 2013 Wrap up
It's been two weeks since ABS 2013 finished up. So, definitely need to get the rest of these out before I forget.
The theme of why so few comparative studies continued to pop in talks, from a brief mention in Patricia Gowaty's Tuesday morning plenary talk, to Daniel Caetano's plea for more data sharing, ironically comparing the data sharing situation to that in genomics and indirectly in phylogenetics - where the deposit situation isn't as rosy as in genomics, though certainly better than in behavior (shout out to Terry Ord and his growing collection of Dryad deposits). One interesting comparative talk by Odom reconstructed that female song was present in the ancestry of songbirds, a conclusion that was dependent inclusion of a representative sample of taxa outside the northern temperate zone.
I missed a fair number of talks due to ongoing commitments, but managed to catch a number related to spiders, cognition, and a couple of the social learning talks on the final contributed session. One of my favorite spider talks was Elizabeth Jakob's look at visual perception in jumping spiders - in particular a study of biological motion which used moving dot animations constructed by an undergraduate. I spent some time looking at biological motion perception in pigeons a long time ago, so to see that jumping spiders can distinguish spider-like animations from scrambled dot animations was quite amusing. Another spider talk that caught my attention was Schwartz's Allee talk on spontaneous male death in the Dark Fishing Spider. Although this work was published a few weeks before, the talk filled in a lot of details I missed in the paper. Overall spiders were well represented, including Andrade's plenary talk on spider sociality.
There were also some good (and not so good) presentations in cognition and social learning. A nice comparative study across 9 families of mammalian carnivora showed widespread (8 of 9 families, excluding Herpestidae) ability to solve a box opening task involving a latch (though not every species in the 8 families could solve the task). The question being asked was whether sociality (as measured by group size) was predictive on performance in the task - it wasn't. What did predict task performance was the size of the repertoire of actions used to manipulate the box and neophobia (negatively). There was also a suggestion that brain size was predictive. I'm very cautious about that last result - there have been too many poorly done brain size studies and this analysis seemed to be an after thought. I saw a couple of cephalopod cognition talks that I didn't find convincing - I know cephalopods, especially free living, are hard to study, but that's all the more reason to throughly understand the issues involved before spending hundreds of hours of data that fail to address all the alternative explanations.
There were a number of good social learning talks on the last day - I unfortunately missed a good one on social learning in Drosophila, but did see Simon Reader's talk on conditions that influence choices of learning strategy, using guppies, which have become another good model system for social learning. Ipek Kulahci looked at the role of social networks (both positive and negative relations) on demonstrator effectiveness in two Corvus species (C. corax and C. corone). She looked at the effect of relationship on both attention to the demonstrator (observers were free to ignore demonstrators completing a box task designed to minimize scrounging) as well as learning effectiveness.
Apart from the talks, this was a good meeting for socializing - being the 50th anniversary meeting of the ABS, lots senior and emeritus people were there, including my advisor Jack Hailman. There was also more focused socializing, especially as several us continued our planning for next year's followups for the Behavior Ontologies workshop from last February.
The theme of why so few comparative studies continued to pop in talks, from a brief mention in Patricia Gowaty's Tuesday morning plenary talk, to Daniel Caetano's plea for more data sharing, ironically comparing the data sharing situation to that in genomics and indirectly in phylogenetics - where the deposit situation isn't as rosy as in genomics, though certainly better than in behavior (shout out to Terry Ord and his growing collection of Dryad deposits). One interesting comparative talk by Odom reconstructed that female song was present in the ancestry of songbirds, a conclusion that was dependent inclusion of a representative sample of taxa outside the northern temperate zone.
I missed a fair number of talks due to ongoing commitments, but managed to catch a number related to spiders, cognition, and a couple of the social learning talks on the final contributed session. One of my favorite spider talks was Elizabeth Jakob's look at visual perception in jumping spiders - in particular a study of biological motion which used moving dot animations constructed by an undergraduate. I spent some time looking at biological motion perception in pigeons a long time ago, so to see that jumping spiders can distinguish spider-like animations from scrambled dot animations was quite amusing. Another spider talk that caught my attention was Schwartz's Allee talk on spontaneous male death in the Dark Fishing Spider. Although this work was published a few weeks before, the talk filled in a lot of details I missed in the paper. Overall spiders were well represented, including Andrade's plenary talk on spider sociality.
There were also some good (and not so good) presentations in cognition and social learning. A nice comparative study across 9 families of mammalian carnivora showed widespread (8 of 9 families, excluding Herpestidae) ability to solve a box opening task involving a latch (though not every species in the 8 families could solve the task). The question being asked was whether sociality (as measured by group size) was predictive on performance in the task - it wasn't. What did predict task performance was the size of the repertoire of actions used to manipulate the box and neophobia (negatively). There was also a suggestion that brain size was predictive. I'm very cautious about that last result - there have been too many poorly done brain size studies and this analysis seemed to be an after thought. I saw a couple of cephalopod cognition talks that I didn't find convincing - I know cephalopods, especially free living, are hard to study, but that's all the more reason to throughly understand the issues involved before spending hundreds of hours of data that fail to address all the alternative explanations.
There were a number of good social learning talks on the last day - I unfortunately missed a good one on social learning in Drosophila, but did see Simon Reader's talk on conditions that influence choices of learning strategy, using guppies, which have become another good model system for social learning. Ipek Kulahci looked at the role of social networks (both positive and negative relations) on demonstrator effectiveness in two Corvus species (C. corax and C. corone). She looked at the effect of relationship on both attention to the demonstrator (observers were free to ignore demonstrators completing a box task designed to minimize scrounging) as well as learning effectiveness.
Apart from the talks, this was a good meeting for socializing - being the 50th anniversary meeting of the ABS, lots senior and emeritus people were there, including my advisor Jack Hailman. There was also more focused socializing, especially as several us continued our planning for next year's followups for the Behavior Ontologies workshop from last February.
Subscribe to:
Posts (Atom)