Friday, July 22, 2016

A Belated Summer Update

I’ve been quiet since April, but I’ve also been pretty busy.  I have been to one meeting this summer, the 20th International Congress of Arachnology.  I gave a talk about the progress on the spider behavior ontology I mentioned a few posts back.  The slides are here.   I have finished my initial pass for behavior terms in two recent works: Foelix's Biology of Spiders, a standard textbook, and an edited volume, Herberstein's Spider Behaviour.   Since returning from the meeting, I've nearly finished a first pass of data cleaning.  You can see the raw data here.  In the next week or so,  I'll put up a little landing page with more explanation and links to the raw data and the results of the cleaning pass.  Eventually there should be a CMAP or other visualization and possibly an OWL file.

I say possibly an OWL file because I'm not sure that an ontology will be the final outcome.  I may generate a set of terms and turn them into term requests for the NBO/ABO, the Spider Biology Ontology, and others (GO, ChEBI, PATO).  It's been clear to me (finally) that the database is where the interesting comparative work will come from and the ontology is just a necessary support for this.  It should also be useful as a standardized vocabulary for arachnologists, and there seems to be interest in this, based on the reaction to my talk.


Friday, April 15, 2016

obo-behavior is a thing

Yes, this is an update, not an ontological assertion.  Robert Hoehndorf has set up a new github project (obo-behavior) for the NBO and moved the NBO and the ABO associated material over.  So, the negotiation is over and we progress with the merging.  We hope to have the merge done this summer (at least so we can report back to our several funding supporters).

Tuesday, January 19, 2016

Why Arachnolingua needs a T-Box

It is common in knowledge representation to distinguish between statements about individuals (e.g., "John loves Mary") and statements about classes or universals ("Love is a emotion exhibited by humans").   The set of former statements are sometimes referred to as being the "A-box" (A for assertion), the latter set the 'T-box' (T for terminology).  Many ontology languages, such as OWL, allow both types of statements, though ontologies, strictly speaking, are terminology.

It's been nearly five years since I mentioned to Martin Ramirez at a Phenotype Ontology RCN meeting at NESCent that I thought it would be a good idea to test the NeuroBehavior Ontology (NBO) as a vocabulary for describing the behavior of spiders.  The NBO is organized as approximately parallel trees of terms for behavior processes and phenotypes, though with a bias toward the behavior of humans and model organisms.  So I gathered up a collection of papers on the behavior of spiders and some other arachnids and did some annotation of behavior.

Now the annotations in Arachnolingua turn out to be both assertions about (frequently anonymous) individuals as well as about classes.  For example, statements about the types of webs various families of spider produce, or don't produce, are at the level of terminology - statements relating a type of web to a type of spider.  Others, such as the sequence of acts in building a web are necessarily about individuals, because it is generally the same individual (or in some cases group of individuals) involved in each step in building a web.  Each step has an active participant, the same spider for each step.  So descriptions of process will consist of assertions about individuals, even if nothing else is known or stated about the individual.

Actually I was more successful at gathering papers and building some software infrastructure (python frontend, SQL database, Java backend and public website - more detail here).  However, in 2014, I presented a poster at the Animal Behavior Society describing the quality of annotation possible using the NBO process terms for a small set of papers.  Because I included relevant anatomy as part of the annotation I was able to do a little better than if I had just used the NBO terms, however, a lot of interesting behavior wound up as courtship behavior, or locomotion, or predatory behavior with reference to specific body parts.

My original hope was to build the T-box from the data and descriptions in the curated papers. However, it has become clear that the gap from the terminology in the descriptions to the available terms in NBO really is too great. The annotation process requires terms at an intermediate level to be useful. Thus the project to test NBO coverage using spider behavior is concluded.

Therefore, while I continue to tweak the frontend code, I have retreated to the approach recommended by Arp, Smith, and Spear and have started simply extracting behavior terms from two sources.  The first is the 3rd edition of Foelix's Biology of Spiders.  The second, Herberstein's Spider Behaviour, will be a followup to catch anything that Foelix missed.  The idea is to collect the terms, throw them against whatever results from merging NBO and the Animal Behavior Ontology (ABO) process terms and the Gene Ontology to identify synonyms and subsumption 'parents', and propose at least a taxonomy of terms at the World Arachnology Congress in Colorado this summer.  Ultimately these might wind up as a part of the existing spider biology ontology, which already has a small behavior branch, as well as contributing to whatever final vocabulary results from merging process terms from NBO and ABO.  Arp et al. recommend starting with a small set (~50) of terms, I'll grab every behavior related term for a while and pull the most general terms from the list.  The remainder can be added once the tree is in place.

There are deeper issues here, particularly about the relation of behavior phenotypes and their associated processes, but that's a topic for another post.