Tuesday, January 19, 2016

Why Arachnolingua needs a T-Box

It is common in knowledge representation to distinguish between statements about individuals (e.g., "John loves Mary") and statements about classes or universals ("Love is a emotion exhibited by humans").   The set of former statements are sometimes referred to as being the "A-box" (A for assertion), the latter set the 'T-box' (T for terminology).  Many ontology languages, such as OWL, allow both types of statements, though ontologies, strictly speaking, are terminology.

It's been nearly five years since I mentioned to Martin Ramirez at a Phenotype Ontology RCN meeting at NESCent that I thought it would be a good idea to test the NeuroBehavior Ontology (NBO) as a vocabulary for describing the behavior of spiders.  The NBO is organized as approximately parallel trees of terms for behavior processes and phenotypes, though with a bias toward the behavior of humans and model organisms.  So I gathered up a collection of papers on the behavior of spiders and some other arachnids and did some annotation of behavior.

Now the annotations in Arachnolingua turn out to be both assertions about (frequently anonymous) individuals as well as about classes.  For example, statements about the types of webs various families of spider produce, or don't produce, are at the level of terminology - statements relating a type of web to a type of spider.  Others, such as the sequence of acts in building a web are necessarily about individuals, because it is generally the same individual (or in some cases group of individuals) involved in each step in building a web.  Each step has an active participant, the same spider for each step.  So descriptions of process will consist of assertions about individuals, even if nothing else is known or stated about the individual.

Actually I was more successful at gathering papers and building some software infrastructure (python frontend, SQL database, Java backend and public website - more detail here).  However, in 2014, I presented a poster at the Animal Behavior Society describing the quality of annotation possible using the NBO process terms for a small set of papers.  Because I included relevant anatomy as part of the annotation I was able to do a little better than if I had just used the NBO terms, however, a lot of interesting behavior wound up as courtship behavior, or locomotion, or predatory behavior with reference to specific body parts.

My original hope was to build the T-box from the data and descriptions in the curated papers. However, it has become clear that the gap from the terminology in the descriptions to the available terms in NBO really is too great. The annotation process requires terms at an intermediate level to be useful. Thus the project to test NBO coverage using spider behavior is concluded.

Therefore, while I continue to tweak the frontend code, I have retreated to the approach recommended by Arp, Smith, and Spear and have started simply extracting behavior terms from two sources.  The first is the 3rd edition of Foelix's Biology of Spiders.  The second, Herberstein's Spider Behaviour, will be a followup to catch anything that Foelix missed.  The idea is to collect the terms, throw them against whatever results from merging NBO and the Animal Behavior Ontology (ABO) process terms and the Gene Ontology to identify synonyms and subsumption 'parents', and propose at least a taxonomy of terms at the World Arachnology Congress in Colorado this summer.  Ultimately these might wind up as a part of the existing spider biology ontology, which already has a small behavior branch, as well as contributing to whatever final vocabulary results from merging process terms from NBO and ABO.  Arp et al. recommend starting with a small set (~50) of terms, I'll grab every behavior related term for a while and pull the most general terms from the list.  The remainder can be added once the tree is in place.

There are deeper issues here, particularly about the relation of behavior phenotypes and their associated processes, but that's a topic for another post.