The Ontological Ethologist: Ontologies

Showing posts with label Ontologies. Show all posts

Wednesday, July 15, 2015

Models vs. Data - not a choice for behavior

These are some thoughts that have been rattling in my head since the SSB 2015 standalone meeting back in May. They are somewhat depressing, so I'm open to suggestions of more positive ways to look at the situation in comparative studies of behavior.

The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case). The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward. Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.

The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees. This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.

The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods. The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab. The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary. I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014; doi:10.1093/sysbio/syu070) paper. The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.

Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains. In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site). Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes. Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales. Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth. Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models. Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters? Not necessarily, but they are definitely a start. As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic. However, you aren't limited in your ancestral reconstructions to using contemporary data. This won't solve these problems, but they might give you confidence in your analysis. There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.

Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits. Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory. Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; http://dx.doi.org/10.1016/j.anbehav.2014.09.025) Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes. So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.

I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data. Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies. There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.

As challenging as the data situation is, I worry that the modeling side is in complete disarray. Models of the evolution of behavior are frequently descriptive and of little use for inference. For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade. Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.

If there is work being done here, it is either well ahead of its time or not being recognized for what it is. Please prove me wrong on this. Meanwhile, I can only hope that the time for this theory to model link is not too far off.

Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth. These won't solve the model problem, but they may yield up data that could support some development and testing. The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils. I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved.
The second is the study of cultural change, both human and animal. There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.

I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk. The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed. I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods. I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.

Saturday, August 9, 2014

Finished up a Behavioral Ontology workshop, now at ABS2014

We finished up the first of our two workshops to followup the one-day session we held in conjunction with the 2013 phenotype RCN summit. We gathered together a fairly diverse group of 15 behavior people at Princeton for a day and a half prior to the ABS 2014 meeting. Our task was to compare the ABO (the ontology constructed over the course of two workshops in 2004 and 2005 at Cornell) and the NBO (the ontology for behavior processes and phenotypes developed within the OBO framework). Despite some initial fears, it looks like we have a good chance of coming up with a proposed integration of these that will allow behavioral ecologists to make use of the NBO while not breaking things for the current users, mostly model organism genetics and phenotype investigators.

Thanks to all who attended, my three co-organizers (Anne Clark, Sue Margulis, Cyndy Parr) and also to George Gkoutos, developer and maintainer of the NBO, who listened through most of our friday session and took an hour to host a question and answer session over skype.

Tuesday, August 5, 2014

I'll be at ABS2014 next week.

I’ll be presenting a poster at the Animal Behavior Society meeting next week in Princeton New Jersey. The poster is NE 116. If you aren’t attending, the poster and a supporting script file are now up on figshare. The poster looks at how well NBO serves as a vocabulary for spider behavior. It turned out to do a little better than I expected, and I omitted a couple of statistically non-significant tests I tried over the weekend (looking at term depth across NBO vs depth of NBO terms used in 40 arachnolingua claims).

Note: there is a small chance I may not be able to attend the poster on Tuesday evening. I’ll update this if it turns out I won’t be there.

Thursday, May 28, 2009

This summer

I'll be attending two meetings this summer: Evolution 2009 (Moscow ID) 12-16 June and ICBO (International Conference on Biomedical Ontologies - Buffalo NY) 24-26 July. I'll be presenting some of my recent work with BiSSE at the Evolution meeting (my first non-ontology talk for a while) and representing Phenoscape at ICBO with a poster. Of course the rest of Phenoscape will be at ASIH in Portland while I'm in Buffalo, but it made sense to have Phenoscape represented both places. I'll miss Portland, but Evolution is there next year.

I'm mentoring another Google summer of code project - my student will be developing a Mesquite package that will read and display Phenex annotations to character matrices. Getting Phenex to talk to Mesquite is an important, relatively low-hanging fruit for Nexml to enable, and just the sort of thing I've been trying to do with Nexml for a while now.

I will be leaving Kansas at the end of August and headed (indirectly) to NESCent to start an ontology alignment project. I'm hoping to develop something that might be useful as a prototype both to Phenoscape as well as a core component to EthoOntos, the comparative method backend to OwlWatcher.

Sunday, June 15, 2008

Getting ready for Minneapolis

The evolution meetings in Minneapolis start late this week. For me the meeting will start on Friday, with the Ontology workshop. I'll be giving a 20 minute talk on taxonomy ontologies. The workshop has been booked full for several months now. I am also associated with three posters: one from a comparative methods in R workshop I attended last December, another one from Phenoscape, which gives an update on the TTO, TAO and where we are headed with Phenote and the new workflow, and finally my single author poster on the behavior work. Part of this will be explaining the difference between annotating publications and behavior videos, and part will be an update, not so much on OwlWatcher, but a preliminary discussion of a toy implementation of the EthOntos-Lite alignment module. There is, as of today, running code, but I haven't given it anything more an a couple of trivial examples, which it seems to handle correctly. It would have been nice to have something I could feed OwlWatcher projects into, but that will have to wait - I've still got lots on my plate, despite a productive weekend.

The Ontological Ethologist