Friday, July 24, 2015

A little more on morphological data

Apropos of last week's post on models and data, here is a preprint study on the amount of morphological data available for living mammal taxa.  The focus was on coverage in each order, where coverage was defined by the fraction of OTUs with available data, where available seems to be defined as appearing in a character matrix with more than 100 characters (which doesn't necessarily mean that >100 characters were scored for each OTU).  I can't judge whether the three databases and the literature search constitute a sufficient search, but their results aren't implausible.   Using their appearance in a sufficiently character-rich matrix, they counted 1074 mammalian OTUs in 126 matrices.  Without the matrix size filtering, there were 5228 OTUs in 286 matrices.  This is apparently less than the morphology data for fossil mammal taxa.

I expect there are very few character matrices for any group with more than 100 behavioral characters.  This is probably a combination of the difficulty of extracting a 100 behavioral characters in a study as well as the rarity of this sort of comparative/phylogenetic analysis in behavior.

Thanks for to everyone who responded to last week's post.

Wednesday, July 15, 2015

Models vs. Data - not a choice for behavior

These are some thoughts that have been rattling in my head since the SSB 2015 standalone meeting back in May.  They are somewhat depressing, so I'm open to suggestions of more positive ways to look at the situation in comparative studies of behavior.

The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case).  The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward.  Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.

The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees.  This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.

The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods.  The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab.   The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary.  I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014;  doi:10.1093/sysbio/syu070) paper.  The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.

Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains.  In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site).  Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes.  Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales.  Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth.  Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models.  Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters?  Not necessarily, but they are definitely a start.  As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic.  However, you aren't limited in your ancestral reconstructions to using contemporary data.  This won't solve these problems, but they might give you confidence in your analysis.  There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.

Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits.  Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory.  Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; http://dx.doi.org/10.1016/j.anbehav.2014.09.025) Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes.  So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.

I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data.  Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies.   There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.

As challenging as the data situation is, I worry that the modeling side is in complete disarray.  Models of the evolution of behavior are frequently descriptive and of little use for inference.  For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade.  Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.  

If there is work being done here, it is either well ahead of its time or not being recognized for what it is.  Please prove me wrong on this.  Meanwhile, I can only hope that the time for this theory to model link is not too far off.

Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth.  These won't solve the model problem, but they may yield up data that could support some development and testing.  The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils.  I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved. 
The second is the study of cultural change, both human and animal.  There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.

I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk.  The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed.  I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods.  I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.