Friday, July 22, 2016

A Belated Summer Update

I’ve been quiet since April, but I’ve also been pretty busy.  I have been to one meeting this summer, the 20th International Congress of Arachnology.  I gave a talk about the progress on the spider behavior ontology I mentioned a few posts back.  The slides are here.   I have finished my initial pass for behavior terms in two recent works: Foelix's Biology of Spiders, a standard textbook, and an edited volume, Herberstein's Spider Behaviour.   Since returning from the meeting, I've nearly finished a first pass of data cleaning.  You can see the raw data here.  In the next week or so,  I'll put up a little landing page with more explanation and links to the raw data and the results of the cleaning pass.  Eventually there should be a CMAP or other visualization and possibly an OWL file.

I say possibly an OWL file because I'm not sure that an ontology will be the final outcome.  I may generate a set of terms and turn them into term requests for the NBO/ABO, the Spider Biology Ontology, and others (GO, ChEBI, PATO).  It's been clear to me (finally) that the database is where the interesting comparative work will come from and the ontology is just a necessary support for this.  It should also be useful as a standardized vocabulary for arachnologists, and there seems to be interest in this, based on the reaction to my talk.


Friday, April 15, 2016

obo-behavior is a thing

Yes, this is an update, not an ontological assertion.  Robert Hoehndorf has set up a new github project (obo-behavior) for the NBO and moved the NBO and the ABO associated material over.  So, the negotiation is over and we progress with the merging.  We hope to have the merge done this summer (at least so we can report back to our several funding supporters).

Tuesday, January 19, 2016

Why Arachnolingua needs a T-Box

It is common in knowledge representation to distinguish between statements about individuals (e.g., "John loves Mary") and statements about classes or universals ("Love is a emotion exhibited by humans").   The set of former statements are sometimes referred to as being the "A-box" (A for assertion), the latter set the 'T-box' (T for terminology).  Many ontology languages, such as OWL, allow both types of statements, though ontologies, strictly speaking, are terminology.

It's been nearly five years since I mentioned to Martin Ramirez at a Phenotype Ontology RCN meeting at NESCent that I thought it would be a good idea to test the NeuroBehavior Ontology (NBO) as a vocabulary for describing the behavior of spiders.  The NBO is organized as approximately parallel trees of terms for behavior processes and phenotypes, though with a bias toward the behavior of humans and model organisms.  So I gathered up a collection of papers on the behavior of spiders and some other arachnids and did some annotation of behavior.

Now the annotations in Arachnolingua turn out to be both assertions about (frequently anonymous) individuals as well as about classes.  For example, statements about the types of webs various families of spider produce, or don't produce, are at the level of terminology - statements relating a type of web to a type of spider.  Others, such as the sequence of acts in building a web are necessarily about individuals, because it is generally the same individual (or in some cases group of individuals) involved in each step in building a web.  Each step has an active participant, the same spider for each step.  So descriptions of process will consist of assertions about individuals, even if nothing else is known or stated about the individual.

Actually I was more successful at gathering papers and building some software infrastructure (python frontend, SQL database, Java backend and public website - more detail here).  However, in 2014, I presented a poster at the Animal Behavior Society describing the quality of annotation possible using the NBO process terms for a small set of papers.  Because I included relevant anatomy as part of the annotation I was able to do a little better than if I had just used the NBO terms, however, a lot of interesting behavior wound up as courtship behavior, or locomotion, or predatory behavior with reference to specific body parts.

My original hope was to build the T-box from the data and descriptions in the curated papers. However, it has become clear that the gap from the terminology in the descriptions to the available terms in NBO really is too great. The annotation process requires terms at an intermediate level to be useful. Thus the project to test NBO coverage using spider behavior is concluded.

Therefore, while I continue to tweak the frontend code, I have retreated to the approach recommended by Arp, Smith, and Spear and have started simply extracting behavior terms from two sources.  The first is the 3rd edition of Foelix's Biology of Spiders.  The second, Herberstein's Spider Behaviour, will be a followup to catch anything that Foelix missed.  The idea is to collect the terms, throw them against whatever results from merging NBO and the Animal Behavior Ontology (ABO) process terms and the Gene Ontology to identify synonyms and subsumption 'parents', and propose at least a taxonomy of terms at the World Arachnology Congress in Colorado this summer.  Ultimately these might wind up as a part of the existing spider biology ontology, which already has a small behavior branch, as well as contributing to whatever final vocabulary results from merging process terms from NBO and ABO.  Arp et al. recommend starting with a small set (~50) of terms, I'll grab every behavior related term for a while and pull the most general terms from the list.  The remainder can be added once the tree is in place.

There are deeper issues here, particularly about the relation of behavior phenotypes and their associated processes, but that's a topic for another post.

Monday, October 26, 2015

Behavior Ontologies workshop 2

Over the weekend we held the second workshop (funded by NSF and the Phenotype Ontologies RCN) to begin the process of merging the ABO (Animal Behavior Ontology, which came out of a series of workshops in 2004 and 2005) and the NBO (NeuroBehavior Ontology) which, as we learned at the workshop, had its roots in a phenotype vocabulary that started around 2001.  To be fair, the effort that led to ABO came out of a private discussion between two of the principals held at the 1999 ABS meeting.   The workshop was held at the Smithsonian Museum in Washington.

We aren't quite where I expected to be after the meeting.  We made good progress getting started on a use-case based paper for applications of a behavior ontology.  We also have a real home for the ABO - we deposited the OWL rendering I generated in 2006 as the initial commit here  (note that this is the same repository where NBO is maintained. The main use of the ABO was for indexing an ethogram repository called Ethosearch. Ethosearch enhanced the ABO by adding definitions to many of the original terms. We should shortly have those terms merged into the OWL version of the ABO.

We started the process of merging the ABO and NBO. One of ABO's strengths is a clear division between observable behavior (acts, events) and functional interpretations (for example, running vs. fleeing from a predator). The NBO is organized rather differently and we would like the division in ABO to appear at least somewhere in NBO. NBO does have a sizable number of terms that would apply to neither, but are more mechanism or 'mental function' related. The main challenge to reorganization is making changes in a way that would be acceptable to other NBO stakeholders. One of our tasks in this meeting was to identify at least some of the other stakeholders that might be affected by a merge and reorganization. I would say this goal was partially met, but I'm not sure we have identified everyone.  (Personal note: Arachnolingua does use NBO, but would only benefit from any ABO integration).

We have plans for sharing this work (besides the paper, there will be, at the very least, a poster at SICB-2016, and there were other ideas floated on Sunday).  I will have updates as things progress.

Workshop Organizers: Anne Clark, Sue Margulis, Cynthia Parr, Katja Schultz (Local Host), and myself
Participants: Elissa Chesler, George Gkoutos, David Osumi-Sutherland, and Reid Rumelt
Melissa Haendel participated remotely. 



Friday, July 24, 2015

A little more on morphological data

Apropos of last week's post on models and data, here is a preprint study on the amount of morphological data available for living mammal taxa.  The focus was on coverage in each order, where coverage was defined by the fraction of OTUs with available data, where available seems to be defined as appearing in a character matrix with more than 100 characters (which doesn't necessarily mean that >100 characters were scored for each OTU).  I can't judge whether the three databases and the literature search constitute a sufficient search, but their results aren't implausible.   Using their appearance in a sufficiently character-rich matrix, they counted 1074 mammalian OTUs in 126 matrices.  Without the matrix size filtering, there were 5228 OTUs in 286 matrices.  This is apparently less than the morphology data for fossil mammal taxa.

I expect there are very few character matrices for any group with more than 100 behavioral characters.  This is probably a combination of the difficulty of extracting a 100 behavioral characters in a study as well as the rarity of this sort of comparative/phylogenetic analysis in behavior.

Thanks for to everyone who responded to last week's post.

Wednesday, July 15, 2015

Models vs. Data - not a choice for behavior

These are some thoughts that have been rattling in my head since the SSB 2015 standalone meeting back in May.  They are somewhat depressing, so I'm open to suggestions of more positive ways to look at the situation in comparative studies of behavior.

The main part of the meeting was bracketed by two 'panel' discussions (I scare quote the word panel because the panel was two people in each case).  The first panel consisted of David Hillis and Antonis Rokas arguing whether models or data would be more important going forward.  Not surprisingly for this meeting, the focus was on molecular methods, particularly genomic comparative analysis. Interesting question, but of course both are viable research paths at this point - gathering and managing rapidly growing data sets and refining the models of molecular evolution to more realistically mimic the actual processes are both worth pursuing.

The question of data vs. models came up several times, including during another panel discussion of putting dates on time trees.  This seems to be a very fertile area for developing improved models and statistical methods at the moment, while the corresponding fossil data is accumulating at a steadier pace.

The second panel was Wayne Maddison and Cécile Ané discussing the limits of comparative methods.  The issues that Wayne raised are ones I know firsthand, mostly from my stint in his UBC lab.   The issue is that most or all comparative methods for discrete trait values suffer from phylogenetic non-independence, despite 25 year-old claims to the contrary.  I spent some time thinking about these issues, but the best discussion at this point is the Maddison and FitzJohn (2014;  doi:10.1093/sysbio/syu070) paper.  The situation is somewhat better for continuous traits, though there are always questions of model adequacy. I'll admit I don't remember a great deal of Ané 's discussion of limits of OU methods, though it seemed someone more optimistic for the continuous trait cases that OU applies to. Wayne did point out that tip data won't help answer the question of trends in evolution - you'll need fossil data for that.

Thinking about the situation after the meeting wrapped up, I was struck about the differences between different trait domains.  In molecular biology we have lots of data and a selection of models that are at least plausible representations of things that actually happen (not perfect by any means, but GTR is a reasonable stochastic approximation of what happens at a single site).  Lots of data allow a certain freedom to 'run about' the non-independence problems mentioned in the last paragraph - with enough sites, you could, in principle, assume independent changes.  Molecular data also provide a small, but real sampling of fossil data and useful amounts of molecular evolution can occur over experimental time-scales.  Thus, molecules provide lots of data, a solid starting point for models, and some temporal depth.  Morphology has fossils, a reasonable and slowly growing dataset and a mix of continuous and discrete trait models.  Are Brownian motion and OU models good stochastic approximations of morphological change for continuous characters?  Not necessarily, but they are definitely a start.  As noted above, the situation for discrete characters, even using model-based (Likelihood and Bayesian) methods is rather problematic.  However, you aren't limited in your ancestral reconstructions to using contemporary data.  This won't solve these problems, but they might give you confidence in your analysis.  There is room for optimism here, particularly the hope of better models and statistical methods to make the most of the data available, while new data trickle in.

Then we come to behavior: no fossil data and not a lot of data at all relative to molecular traits.  Behavior also has the problem of much of the data being transient - if not captured on a recording device, it's just an observer's memory.  Apart from the lost opportunities to capture data, the animal behavior community has been slow to embrace the culture of data sharing, as discussed in Caetano and Aisenberg's (2014; http://dx.doi.org/10.1016/j.anbehav.2014.09.025) Forgotten Treasures paper. Data sharing is, of course, more than just dumping your raw data in a repository - to be useful, the data require annotation, even if that is little more than plain text labels for columns and a glossary of observation codes.  So behavior researchers need to up their game to overcome the challenges of slow data accumulation dribbling into a leaky pipeline.

I became interested in ontologies and knowledge representation for behavior because I, rather optimistically, it turned out, thought a flood of behavior data would follow the flood of molecular data.  Ontologies have played an important role in making sense of genes and proteins, and are slowly starting to contribute to morphological studies, but behavior and especially behavioral ecology and ethology lag behind even other branches of ecology in making use of ontologies.   There is some motion towards an ontology (or sub-ontology) for behavioral ecology, follow me here for updates.

As challenging as the data situation is, I worry that the modeling side is in complete disarray.  Models of the evolution of behavior are frequently descriptive and of little use for inference.  For example, there is a sizable collection of models for the evolution of sexual signals, but I defy anyone to throw these models on branches of a tree and generate likelihood estimations of the history of signaling in any clade.  Note that this isn't the same as applying an Brownian or OU model of change to a particular measurement or set of measurements sampled from a signal and testing for the presence of selection - Brownian motion is not a model of sexual selection and it isn't clear (at least to me) that there is a way to go from a descriptive model of sexual selection to a something that would yield up a likelihood estimate.  

If there is work being done here, it is either well ahead of its time or not being recognized for what it is.  Please prove me wrong on this.  Meanwhile, I can only hope that the time for this theory to model link is not too far off.

Rather than end on such a pessimistic note, I will suggest two places to start looking for temporal depth.  These won't solve the model problem, but they may yield up data that could support some development and testing.  The first is the fossil traces of behavior - this means both behavior inferred from morphological fossils as well as, secondarily fossil artifacts and trace fossils.  I have some reservations about the later, simply because it is frequently hard to identify the organism(s) involved. 
The second is the study of cultural change, both human and animal.  There are a lot of interesting questions here, especially at the group/population level, though the link to heritable genetic change still has a long ways to go.

I know not everyone in the comparative methods community is going through the stages of grief that Wayne Maddison discussed in his SSB talk.  The community response to his questionnaire reflected optimism for new methods, though I don't know how many behaviorists were surveyed.  I did speak with Emilia Martins in the ABS meeting a few weeks later and she was more optimistic about the state of comparative methods.  I hope she's right and that the behavior community will find a welcome from the comparative methods community when we manage to shake off the fog of our data amnesia.



Thursday, April 16, 2015

Summer meetings 2015

In recent years, the main thing I've been using this blog for is letting people know about meetings I'm attending and from time to time, meeting reports of varying quality.  So here are the meetings I'll be attending during Summer 2015:

Society for Systematic Biologists standalone meeting (Ann Arbor MI May 20-22) Not planning to present here, but will be around for face-to-face with other OpenTree of Life people and anyone else who wants to discuss or simply catch up.

Animal Behavior Society (Anchorage AK June 10-14).  I'm coming up to Anchorage a couple of days early, and currently have the 9th completely open. No sudden exits this year I promise. I have signed up to give a talk about Arachnolingua and I promise I'll keep the focus on the spiders (ontologies and reasoning will stay in the background).

American Arachnological Society (Mitchell SD June 19-23).  Not sure whether I'll do a poster here.  The plan is mostly to play sponge and catch up with some people.

I'm currently focussing on contributing to the OpenTree taxonomy effort (with Jonathan Rees) and reworking the Arachnolingua tools.  Nothing to report on the next Behavior Ontologies workshop at this time.