Sunday, December 13, 2009

Ontology matching and Phylogenies

As many will know, I've been spending the autumn at NESCent, working on two projects: a continuing effort in Phenoscape, and a new project to develop and implement an algorithm to align multiple taxon-specific ontologies using a tree. The resulting tool, Phylontal is still aways from even an initial release, but I still gave a brown-bag talk on Friday that covered ontology matching as it relates to evolutionary biology, particular compartive methods. While there is ongoing interest in the general topic of ontology matching (e.g., the OntologyMatching site) there has been relatively little in either the model organism or evolutionary biology communities. This is starting to change, there are several approaches being tried by model organism projects (most notably Uberon and the Homontol tool and Homology ontology of the BGEE project).

Although Uberon and Homontol may represent viable approaches for linking model organism ontologies, I've been dubious from the start that any approach that ignores or minimizes the role of phylogeny would be appropriate for studies that combine ontologies to ask comparative questions. Phylontal extends some of the ideas introduced by Homonotol and its Homologous Organ Groups (HOG's) by attaching alignments (the results of matching operations) to specific nodes in a tree and by explicitly distinguishing homologous and non-homologous alignments. Homolonol could move in a similar direction, and their homology ontology suggests they have been thinking about other types of correspondences between anatomical terms, but their multispecies gene expression database is plenty to fill their plate I think. If nothing else, introducing phylogeneticists to these issues will get people thinking about this.

In the talk, the question of missing various absent terms came up, especially when I discussed how phylontal could deal with a missing term in an ingroup that was shared with an outgroup. I'm beginning to think that the OwlWatcher approach of reasoning up from a series of instances, each of which is a graph, might allow the distinction between absent and missing terms to appear. This is particularly true in behavior sequences: if in one clade the sequence A->B->C is observed, and in another C immediately follows A in all the observed instance, then B is absent where it would be expected to be observed. Likewise, if all the observations show no successor to A, and no predecessor to C, then B may just not have been observed. It's the combination of use of sequences (complete orderings) and the ability to refer back to observed instances that make the difference. In principle, you could do the same sort of thing with anatomy by building chains of connections, but these are not the sort of details that make it into character matrices, so it would require going back to drawings/photos/free text and probably putting it into the taxon-level phenotype statements rather than the multispecies ontologies.

Aligning phenotypes might be a new frontier for Phenoscape and similar projects.

There was also some discussion about whether a down-pass from tips to root was sufficient to match terms. If so, then Phylontal can avoid some work when phylogenies change by getting the tree nodes aligned first. If otherwise, as Dave Swafford pointed out, it may be necessary to align from scratch with a new tree. This is an open and potentially important question.



Thursday, October 1, 2009

Ethosource meeting

Last weekend I attended a meeting of the advisory board for the Ethosource project. This project was launched originally by Emilia Martins and Anne Clark and initially focused on repositories for behavior data and making it accessible in a controlled manner. Repositories lead to metadata which eventually lead to ontologies for behavior. Although my interest in behavior ontologies started from more of an analytic angle, behavior ontologies are part of the solution to both repository metadata and comparative analysis of descriptive data, so worthy of pursuit regardless.

Ethosource was funded early in the decade (not sure of the exact interval) and held meetings on both repository issues and ontology development. Most notably from my perspective were the two Cornell workshops that lead to the ABO. More recently, Anne Clark and Sue Margulis obtained a three-year grant from the Institute of Museum and Library Services to support development of an ethogram database, indexed by the ABO, called Ethosearch. Although the web interface is not yet publicly available (though close), the database is a substantial effort, which includes over 1,000 ethograms (if I remember correctly).

The board meeting was held at the University at Binghamton over 1-1/2 days. The first morning was overviews and presentations from attendees. Anne Clark started with a history of Ethosource, followed by Leah Melber discussing a k-12 outreach program that used behavior and ethograms at the Lincoln Park zoo. Mike Webster, the new head of the Macaulay Library discussed depositing data and how scientific repositories, as opposed to YouTube, need to be selective. Mike just started a few weeks ago, replacing Jack Bradbury - who co-chaired the Cornell workshops, and passed some of the questions to Ed Scholes, the Macaulay video curator and another fan of using ontologies as ethograms. Anne comments about a Japanese researcher who stopped by the Ethosource table at last year's ISBE meeting. He has established a sort of YouTube for animal behavior site with annotated clips but is relatively unselective in what he accepts as long as it is relevant to animal behavior.

Cyndy Parr, who has a history with Corvid behavior and the Cornell workshops has been director of species pages at the Encylopedia of Life for about a year. She discussed the role of EoL as a place where integration doesn't necessarily happen, but it helps people find other working on similar projects. She reviewed how EoL worked, including how images were harvested from Flickr and what the opportunities for behavioral contributions to species pages were. She gave an overview of EoL funding opportunities which we went over in more detail on Sunday.

After a brief lunch break, I followed with a presentation of Phenoscape, starting with an overview of the curation process and how we coded character matrices into EQ statements. After showing a few screen shots from Phenex and figures from my ICBO poster, I finished with a list of challenges for integrating behavior into OBO and showed a slide of the Biological Process Tree from GO. After describing the curation process, Emilia Martins, who I was very happy to see attend, asked about the getting the process of curating publications for Phenoscape out to individual authors. Emilia is concerned that not distributing the work might be the eventual kiss of death to any annotation process. Although that doesn't seem to be necessarily the case in other annotation projects, I have heard that ZFIN has recently had some difficulty keeping up with the rising flow of Zebrafish papers they selected to curate.

I've been trying to get OBO to acknowledge the existence of Ethosource and the ABO ontology for a while now. MGI sent Sue Bello, a mouse phenotype curator, to the meeting. This was an important step and her presentation gave everyone a sense of how ontologies are actually used in at least some model organism projects. I say some because, unlike the PATO phenotype ontology used by Phenoscape and ZFIN, MGI and (I believe RGD) use a precomposed trait ontology (Mammalian Phenotype Ontology), rather than building postcompositions of ontology terms.

Emilia discussed Ethobank's status; it was built from a collection of media material from an archive of lizard displays. It is no longer online due to some security issues. Although Indiana University has offered unlimited raw storage space, ongoing funding would still be required for database maintenance and curation. Emilia also observed that the behavior community lacks a tradition of combining data across investigations.

We then started discussing issues that could come out of the meeting. One of the first was the status of the ABO core ontology. There are currently at least four versions floating around: the one posted at ethodata.org, which is the official product of the Cornell workshops, the one edited by Anne and Sue for Ethosearch, the OWL conversion (but no additions) I have distributed with OwlWatcher and dropped in OwlWatcher's sourceforge site, and the version developed by David Shotton and his student, which includes independent revisions and term definitions. We discussed putting ABO on sourceforge and adopting a term tracking and curation approach like that adopted by several OBO projects.

We also thought of identifying several key projects that would serve as exemplars (case studies) for ethobank, and felt that these should include zoo projects, as zoo data may be more easily comparable than data across multiple wild populations.

The role of educational outreach, particularly at the K-12 level was discussed, both collection and analysis (e.g., bringing data together from multiple student observers). Traditional citizen science, as run by the Cornell Lab of Ornithology is more focused on collection than analysis, though the data is generally available, if members of the public knew what they wanted to do with it.

We discussed goals at various time scales, which were elaborated on Sunday morning.

Although the current entry tool does not support multiple parents from terms in the ontology, there was a consensus that many descriptions will require multiple parents to capture correctly. Although it might have been fun to introduce the notions of intersection and restriction from OWL at this point, Ethosearch and ABO have a ways to go before these would be meaningful.

We also saw demos of two search tools for ethosource database that were developed by Weiyi Meng and his student Jiang Yu. These are used both to assist contributors in categorizing their submissions and for clustering and mining in a way that suggests they would be applicable to comparative analysis, at least at the level of bringing relevant paragraphs from different ethograms together.


On Sunday, we went over our goals and what we would be doing. This involved continuing to serve as conduits to our respective projects, getting the word out to wider communities (e.g., ABS, ISBE, and the 2011 IEC). Likewise, there was discussion of K-12 outreach and stepping up the collaboration with Lincoln Park Zoo and the wider zoo community. The need for more work on the editoral review process was also discussed, both in terms of mechanics, but also whether there was a way to make it sustainable. Cyndy Parr gave us more details on EoL funding that might be relevant to supporting curation of material from existing collections. It was clear that continuing to build the bridge with Macaulay Library would be a near term priority.

After the formal break up of the meeting, Sue Bello gave Anne a brief walk through of OBO-Edit and the Mammalian Trait Ontology. There was interest on both sides in having Anne and her students review the behavior portion of this. I also passed on some material relating to the CARO anatomy ontology as an example of what a common phenotype (e.g., for behavior) might look like in OBO.

OwlWatcher is broken, and what will happen to fix it

Summary

The video support in OwlWatcher for OSX is broken, in both Leopard and Snow Leopard. As far as I can tell, it still works in Windows. I am working on a fix for the problem which will require use of a different video player (Quicktime support in Java is gone and Apple seems to have no interest in bringing it back). I am investigating alternative player frameworks, some based on JMF, some on wrappers for FFMPEG, and some which are both. I expect the result of this will be a more complex installation process, at least for OSX, but it will also open up the possibility of using OwlWatcher on Linux (something I've been wanting for a while now).

Rant
Apple's support for Quicktime in java has always been rather spotty, and developers have been burned by Java upgrades which broke Quicktime before. I investigated alternative Quicktime bindings this week (Rococoa), but those seem to be broken in Snow Leopard as well. Although I expect the Rococoa developer(s) will eventually try to address this, there are alternatives to Quicktime, especially for this application, so it is time to finally make OwlWatcher independent of Quicktime.

I should have done this a while ago, and I apologize for anyone who has been inconvenienced by this. I do not blame Apple for OwlWatcher breaking, I was expecting something like this to happen. However, I should point out that there was apparently a Java update from Apple that also broke Quicktime support in 10.5 (Leopard) and I'm rather disappointed that things broke even without an upgrade to SnowLeopard.

It was suggested that I consider an alternative to Java. Unfortunately, I don't think reimplementing this in Python (which seems more to my taste than Perl, though I've written a bit more of the latter) would necessarily avoid this sort of breakage. Apple seemed to be promoting Python and Perl as having Cocoa support in 10.5, but, looking at the new X-code IDE for Snow Leopard, their enthusiasm for these languages has waned. Unfortunately, it seems at the moment that some of the most interesting work in scripting languages is going on with languages like Scala and Clojure, which are also jvm-based. So much the worse for Apple, though I'm sure it won't affect the sale of iPhones or the development of applications, which seems to be where Apple's focus increasingly lies.

Friday, June 19, 2009

My Evolution 2009 Highlights Saturday

Well, time to write this up before it gets too stale. First, what I missed: I heard about the talk on snakes that are specialized snail predators, but not until the next day. It has been covered elsewhere (e.g., Denim and Tweed) , so enough said.

I spent Saturday morning jumping around - caught David Wilson's opening talk and part of Peter Richerson's talk on gene culture evolution in the EvoS symposium. As someone doing a thesis project on social learning and culture in the 1990's the Boyd and Richerson book was required reading, so I wanted to hear what Richerson had to say. Sadly, he seemed to still equate culture with information passed by social learning, a view that the mainstream passed by a while ago. Social learning is an important component of culture, but is certainly not sufficient, either for chimpanzees or scrub-jays to be considered cultural creatures. Speaking of scrub-jays, I left the Richerson talk to catch the Aphelocoma divergence and speciation talk by John McCormack, a post-doc in Lacey Knowles lab. The only talk I remember from the later Phylogenetic methods section was the SATe talk, maybe because Jiaye Yu, in the Holder lab, has spent some time recently with it, even though he wasn't involved in this particular talk.

Saturday afternoon I spent in the Diversification symposium. Since I have been involved with BiSSE, I figured I should catch up a bit on the field. Of course, the Rich Fitz-John talk at the end (not listed in the program) was the most relevant. Rich has developed an implementation of BiSSE in his R package Diversitree, which besides the likelihood approach that we implemented in Mesquite, also includes an MCMC estimator, as well as his forward simulation method for dealing with missing tree structure (in press in Syst. Biol.) Rich also has a BiSSE-like method for continuous traits from which he showed some preliminary results in his talk. Very cool stuff.

All the talks in the Diversification symposium were good, and seeing the range of approaches was useful for me. I knew something of Dan Rabosky and Mike Alfaro were up to, since I had met them at the NESCent R-hackathon in December 2007. I am gradually getting more comfortable with R, I just keep telling myself that behind all those arrays and vectors, there's a Scheme dialect, but it hasn't gelled just yet.

I talked with Rich FitzJohn after the session, mostly about optimization issues and his continuous method.

Thursday, May 28, 2009

This summer

I'll be attending two meetings this summer: Evolution 2009 (Moscow ID) 12-16 June and ICBO (International Conference on Biomedical Ontologies - Buffalo NY) 24-26 July. I'll be presenting some of my recent work with BiSSE at the Evolution meeting (my first non-ontology talk for a while) and representing Phenoscape at ICBO with a poster. Of course the rest of Phenoscape will be at ASIH in Portland while I'm in Buffalo, but it made sense to have Phenoscape represented both places. I'll miss Portland, but Evolution is there next year.

I'm mentoring another Google summer of code project - my student will be developing a Mesquite package that will read and display Phenex annotations to character matrices. Getting Phenex to talk to Mesquite is an important, relatively low-hanging fruit for Nexml to enable, and just the sort of thing I've been trying to do with Nexml for a while now.

I will be leaving Kansas at the end of August and headed (indirectly) to NESCent to start an ontology alignment project. I'm hoping to develop something that might be useful as a prototype both to Phenoscape as well as a core component to EthoOntos, the comparative method backend to OwlWatcher.

Monday, May 25, 2009

Minor Ethotools updates

I've put up in-progress versions of the updated OwlWatcher manuals (both pdf and html). There are also some minor site updates. Nothing profound, but perhaps useful if you're trying the release candidate.

Wednesday, May 20, 2009

It is (sort of) done

I've posted Windows and OSX versions of a release candidate for OwlWatcher 0.040. Despite the small version number bump, this release does represent substantial changes and, I hope, improvements. In addition to switching over to the Manchester OWLAPI, there are improvements to project management and video playback. I made this a release candidate because there seem to be people using OwlWatcher and I'm dubious about backward compatibility, so by making this a release candidate I hope people will approach this with more caution and make backups of their work before trying this.


New Project Tab View

New Watch Tab View

Wednesday, January 14, 2009

OwlWatcher and Mesquite releases coming soon

Wayne Maddison has released a beta of Mesquite 2.6 and I think I have finished the transition of OwlWatcher to the OWLAPI. I will be adding a bit of functionality to OwlWatcher before I release, so except the official release of Mesquite 2.6 and OwlWatcher 0.04 about the same time.

I'll have some comments on Mesquite 2.6 later this week. The existing PDAP:PDTREE seems to work fine with the beta of Mesquite 2.6 so there won't be any immediate PDTREE update.