Monday, October 26, 2015

Behavior Ontologies workshop 2

Over the weekend we held the second workshop (funded by NSF and the Phenotype Ontologies RCN) to begin the process of merging the ABO (Animal Behavior Ontology, which came out of a series of workshops in 2004 and 2005) and the NBO (NeuroBehavior Ontology) which, as we learned at the workshop, had its roots in a phenotype vocabulary that started around 2001.  To be fair, the effort that led to ABO came out of a private discussion between two of the principals held at the 1999 ABS meeting.   The workshop was held at the Smithsonian Museum in Washington.

We aren't quite where I expected to be after the meeting.  We made good progress getting started on a use-case based paper for applications of a behavior ontology.  We also have a real home for the ABO - we deposited the OWL rendering I generated in 2006 as the initial commit here  (note that this is the same repository where NBO is maintained. The main use of the ABO was for indexing an ethogram repository called Ethosearch. Ethosearch enhanced the ABO by adding definitions to many of the original terms. We should shortly have those terms merged into the OWL version of the ABO.

We started the process of merging the ABO and NBO. One of ABO's strengths is a clear division between observable behavior (acts, events) and functional interpretations (for example, running vs. fleeing from a predator). The NBO is organized rather differently and we would like the division in ABO to appear at least somewhere in NBO. NBO does have a sizable number of terms that would apply to neither, but are more mechanism or 'mental function' related. The main challenge to reorganization is making changes in a way that would be acceptable to other NBO stakeholders. One of our tasks in this meeting was to identify at least some of the other stakeholders that might be affected by a merge and reorganization. I would say this goal was partially met, but I'm not sure we have identified everyone.  (Personal note: Arachnolingua does use NBO, but would only benefit from any ABO integration).

We have plans for sharing this work (besides the paper, there will be, at the very least, a poster at SICB-2016, and there were other ideas floated on Sunday).  I will have updates as things progress.

Workshop Organizers: Anne Clark, Sue Margulis, Cynthia Parr, Katja Schultz (Local Host), and myself
Participants: Elissa Chesler, George Gkoutos, David Osumi-Sutherland, and Reid Rumelt
Melissa Haendel participated remotely. 

Friday, July 24, 2015

A little more on morphological data

Apropos of last week's post on models and data, here is a preprint study on the amount of morphological data available for living mammal taxa.  The focus was on coverage in each order, where coverage was defined by the fraction of OTUs with available data, where available seems to be defined as appearing in a character matrix with more than 100 characters (which doesn't necessarily mean that >100 characters were scored for each OTU).  I can't judge whether the three databases and the literature search constitute a sufficient search, but their results aren't implausible.   Using their appearance in a sufficiently character-rich matrix, they counted 1074 mammalian OTUs in 126 matrices.  Without the matrix size filtering, there were 5228 OTUs in 286 matrices.  This is apparently less than the morphology data for fossil mammal taxa.

I expect there are very few character matrices for any group with more than 100 behavioral characters.  This is probably a combination of the difficulty of extracting a 100 behavioral characters in a study as well as the rarity of this sort of comparative/phylogenetic analysis in behavior.

Thanks for to everyone who responded to last week's post.

Wednesday, July 15, 2015

Models vs. Data - not a choice for behavior

These are some thoughts that have been rattling in my head since the SSB 2015 standalone meeting back in May.  They are somewhat depressing, so I'm open to suggestions of more positive ways to look at the situation in comparative studies of behavior.

The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case).  The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward.  Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.

The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees.  This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.

The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods.  The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab.   The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary.  I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014;  doi:10.1093/sysbio/syu070) paper.  The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.

Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains.  In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site).  Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes.  Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales.  Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth.  Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models.  Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters?  Not necessarily, but they are definitely a start.  As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic.  However, you aren't limited in your ancestral reconstructions to using contemporary data.  This won't solve these problems, but they might give you confidence in your analysis.  There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.

Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits.  Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory.  Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes.  So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.

I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data.  Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies.   There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.

As challenging as the data situation is, I worry that the modeling side is in complete disarray.  Models of the evolution of behavior are frequently descriptive and of little use for inference.  For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade.  Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.  

If there is work being done here, it is either well ahead of its time or not being recognized for what it is.  Please prove me wrong on this.  Meanwhile, I can only hope that the time for this theory to model link is not too far off.

Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth.  These won't solve the model problem, but they may yield up data that could support some development and testing.  The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils.  I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved. 
The second is the study of cultural change, both human and animal.  There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.

I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk.  The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed.  I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods.  I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.

Thursday, April 16, 2015

Summer meetings 2015

In recent years, the main thing I've been using this blog for is letting people know about meetings I'm attending and from time to time, meeting reports of varying quality.  So here are the meetings I'll be attending during Summer 2015:

Society for Systematic Biologists standalone meeting (Ann Arbor MI May 20-22) Not planning to present here, but will be around for face-to-face with other OpenTree of Life people and anyone else who wants to discuss or simply catch up.

Animal Behavior Society (Anchorage AK June 10-14).  I'm coming up to Anchorage a couple of days early, and currently have the 9th completely open. No sudden exits this year I promise. I have signed up to give a talk about Arachnolingua and I promise I'll keep the focus on the spiders (ontologies and reasoning will stay in the background).

American Arachnological Society (Mitchell SD June 19-23).  Not sure whether I'll do a poster here.  The plan is mostly to play sponge and catch up with some people.

I'm currently focussing on contributing to the OpenTree taxonomy effort (with Jonathan Rees) and reworking the Arachnolingua tools.  Nothing to report on the next Behavior Ontologies workshop at this time.