Tuesday, August 3, 2010

PDAP and OwlWatcher

I put up a new version of the PDAP:PDTREE Mesquite package last night. Nothing big in this one, mostly a couple of messages regarding the resolution of polytomies (it's arbitrary so the details of individual contrasts will differ). The most important change is that PDAP now supports Mesquite's new install system so you should no long need to manually download archives and drag folders of class files around. It also gives you control of where the PDAP examples (misc PDI files and the guided tour wind up). Previously you were told where to put the class files, but I'd imagine the examples would languish in the archive fold until forgotten or deleted.

I have a request pending that might yield another PDAP release in the next week or so.

OwlWatcher is making progress. The player is more or less done - there are two small issues that I know of (specifically related to audio buffering and 'rocking' single frames back and forth), but happily I can go back to more interesting things such as finishing the integration with version 3 of the OWLAPI. When the release finally comes, it will be a two step process, first installing Xuggler, followed by OwlWatcher itself. If there are other java API's for video that you would like to see supported in OwlWatcher, feel free to request in the comments. I don't have much time to devote to OwlWatcher these days, but now that I've gone through the process of building a player up from a decoder library, it should be easier the second time (Quicktime provided a player that was adequate, but I think this solution will be more flexible as OwlWatcher continues to develop).

Wednesday, July 7, 2010

Back from Portland

Several interesting talks from the Evolution and iEvoBio meetings in Portland last week. Probably most relevant here were several comparative methods talks, and several ontology related talks at iEvoBio. Among the latter, I'll mention Nico Franz's talk on taxonomic ontologies, as well as a lightning talk by Suzi Lewis on a phylogenetically based ontology annotation tool (PAINT). The later is focused on protein orthologs, so it isn't directly relevant to Phylontal, but was nonetheless interesting. Franz's talk was a broad-ranging overview of some important issues in taxonomic ontologies, including the proposal of Schulz et al. (2008).

I presented a poster on Phylontal, as well as a lightning talk on the taxonomic ontology I developed and maintain for Phenoscape. The later was a brief summary and update (NCBI xrefs, common names, and updates to the collections vocabulary). The Phylontal poster is mostly a cleaned up version of the second half of my NESCent talk, but should give an idea of what it is about.

Recently I've been working on adding 'matrix' support to the Phylontal library, to eventually add alignment of character states as well as the possibility of 'unpacking' the homologies underlying a matrix (e.g., using data when you disagree with the homology judgments of the author).

Of course, if people archived their raw data or observations, there would be less need for unpacking matrices.

Saturday, March 27, 2010

Coming this summer plus Phenoscape in Chicago last week

I've registered for the Evolution and iEvoBio meetings this summer. I agree with the desire of the iEvoBio organizers to make a place for informatics approaches at the Evolution meetings - I've certainly given both talks and posters where I got the strong sense that I wasn't speaking to the right audience. There are several presentation plans I could have followed. What I have chosen to do is to present Phylontal as a poster at Evolution, explaining the need and approach in detail, something like the brown bag I gave at NESCent in December. Although it may not be the best venue, a poster seems the best format for Phylontal at this time and it helps justify my going to Evolution as well as iEvoBio. At iEvoBio, I'll give a lighting talk on pipelining OwlWatcher and Phylontal. The third player in this chain - a character constructor - I'll leave for later.

I have made an 'in-progress' release of Phylontal at Google code, but there's no rush. It loads up NeXML and OWL files, allows the user to assign ontologies to tips, then allows the user to lexically match terms from two sister taxa. It's as much a proof of concept for the frontend, operating as OWL + per taxon ontologies. There is another potential use case for phylontal - starting with annotated matrices, extract taxon+anatomy term pairs from annotated NeXML matrices with OBO ontology support and a separate NeXML file for the tree. This would approximate the usage in Phenoscape, and the desired output would be Phenoscape's multi-column homology table.

Chris Mungall gave two interesting presentations on reasoning with homologies at a Phenoscape meeting at the Field Museum in Chicago last week. He brought up some interesting cases involving '2 taxon' versus '3 taxon' homology relations, and an important discussion of the interaction of homology statements and is_a hierarchies. People were most impressed with Chris' approach being extensible to serial homologies, the most important point being that serial homologies strictly exist only within an individual. I think that is an important insight and it lead to a simple taxonomy of homology relationships that actually make sense for anatomical reasoning. The corresponding treatment for behavior classes is worth looking into (e.g., distinguishing similar actions by an individual animal vs. homologous behavior patterns).

There was also some discussion of individuals and inferring classes by abduction or generalization over dinner at some point. I'm beginning to think this may be an important new growth area for biological ontologies, and it was good to hear that several people were thinking in this direction.

Finally, after several experiments, I think the OwlWatcher player is settling down towards a usable configuration using only two threads, rather than the initial four. Still need to deal with the packaging issue, and I fear the first post-Quicktime release won't be an easy install.

I didn't submit a proposal for a phyloinformatics summer of code project this year; I'll help out if it's appropriate for someone else's project, but I didn't have any brainstorms this year. None the less, if you're a student reading this, you will probably find one or more projects of interest there.

Friday, January 29, 2010

OwlWatcher update

Since the beginning of the year, I've been poking away at the new video support that uses Xuggler, a java wrapper for FFmpeg. When I get this worked out, OwlWatcher will not only potentially work under Linux, but be open source compliant as well (LGPL).

At the moment, playing at normal speed and stop works, and stepping forward is sort of working, but I haven't taken a crack at seeking arbitrary frames. The biggest issue so far has been getting audio and video streams to play together. Admittedly audio isn't the highest priority, but it would be nice to get it right so OwlWatcher could support multi-modal behaviors, at least in principle.

The rest of OwlWatcher will be similar to the 0.040 release candidate I posted last May. I'll worry about forward compatibility from 0.035 closer to the release, along with coming up with a reasonable installation process, though I'm sure it won't be as simple as the previous release, at least for a few iterations.

Sunday, December 13, 2009

Ontology matching and Phylogenies

As many will know, I've been spending the autumn at NESCent, working on two projects: a continuing effort in Phenoscape, and a new project to develop and implement an algorithm to align multiple taxon-specific ontologies using a tree. The resulting tool, Phylontal is still aways from even an initial release, but I still gave a brown-bag talk on Friday that covered ontology matching as it relates to evolutionary biology, particular compartive methods. While there is ongoing interest in the general topic of ontology matching (e.g., the OntologyMatching site) there has been relatively little in either the model organism or evolutionary biology communities. This is starting to change, there are several approaches being tried by model organism projects (most notably Uberon and the Homontol tool and Homology ontology of the BGEE project).

Although Uberon and Homontol may represent viable approaches for linking model organism ontologies, I've been dubious from the start that any approach that ignores or minimizes the role of phylogeny would be appropriate for studies that combine ontologies to ask comparative questions. Phylontal extends some of the ideas introduced by Homonotol and its Homologous Organ Groups (HOG's) by attaching alignments (the results of matching operations) to specific nodes in a tree and by explicitly distinguishing homologous and non-homologous alignments. Homolonol could move in a similar direction, and their homology ontology suggests they have been thinking about other types of correspondences between anatomical terms, but their multispecies gene expression database is plenty to fill their plate I think. If nothing else, introducing phylogeneticists to these issues will get people thinking about this.

In the talk, the question of missing various absent terms came up, especially when I discussed how phylontal could deal with a missing term in an ingroup that was shared with an outgroup. I'm beginning to think that the OwlWatcher approach of reasoning up from a series of instances, each of which is a graph, might allow the distinction between absent and missing terms to appear. This is particularly true in behavior sequences: if in one clade the sequence A->B->C is observed, and in another C immediately follows A in all the observed instance, then B is absent where it would be expected to be observed. Likewise, if all the observations show no successor to A, and no predecessor to C, then B may just not have been observed. It's the combination of use of sequences (complete orderings) and the ability to refer back to observed instances that make the difference. In principle, you could do the same sort of thing with anatomy by building chains of connections, but these are not the sort of details that make it into character matrices, so it would require going back to drawings/photos/free text and probably putting it into the taxon-level phenotype statements rather than the multispecies ontologies.

Aligning phenotypes might be a new frontier for Phenoscape and similar projects.

There was also some discussion about whether a down-pass from tips to root was sufficient to match terms. If so, then Phylontal can avoid some work when phylogenies change by getting the tree nodes aligned first. If otherwise, as Dave Swafford pointed out, it may be necessary to align from scratch with a new tree. This is an open and potentially important question.



Thursday, October 1, 2009

Ethosource meeting

Last weekend I attended a meeting of the advisory board for the Ethosource project. This project was launched originally by Emilia Martins and Anne Clark and initially focused on repositories for behavior data and making it accessible in a controlled manner. Repositories lead to metadata which eventually lead to ontologies for behavior. Although my interest in behavior ontologies started from more of an analytic angle, behavior ontologies are part of the solution to both repository metadata and comparative analysis of descriptive data, so worthy of pursuit regardless.

Ethosource was funded early in the decade (not sure of the exact interval) and held meetings on both repository issues and ontology development. Most notably from my perspective were the two Cornell workshops that lead to the ABO. More recently, Anne Clark and Sue Margulis obtained a three-year grant from the Institute of Museum and Library Services to support development of an ethogram database, indexed by the ABO, called Ethosearch. Although the web interface is not yet publicly available (though close), the database is a substantial effort, which includes over 1,000 ethograms (if I remember correctly).

The board meeting was held at the University at Binghamton over 1-1/2 days. The first morning was overviews and presentations from attendees. Anne Clark started with a history of Ethosource, followed by Leah Melber discussing a k-12 outreach program that used behavior and ethograms at the Lincoln Park zoo. Mike Webster, the new head of the Macaulay Library discussed depositing data and how scientific repositories, as opposed to YouTube, need to be selective. Mike just started a few weeks ago, replacing Jack Bradbury - who co-chaired the Cornell workshops, and passed some of the questions to Ed Scholes, the Macaulay video curator and another fan of using ontologies as ethograms. Anne comments about a Japanese researcher who stopped by the Ethosource table at last year's ISBE meeting. He has established a sort of YouTube for animal behavior site with annotated clips but is relatively unselective in what he accepts as long as it is relevant to animal behavior.

Cyndy Parr, who has a history with Corvid behavior and the Cornell workshops has been director of species pages at the Encylopedia of Life for about a year. She discussed the role of EoL as a place where integration doesn't necessarily happen, but it helps people find other working on similar projects. She reviewed how EoL worked, including how images were harvested from Flickr and what the opportunities for behavioral contributions to species pages were. She gave an overview of EoL funding opportunities which we went over in more detail on Sunday.

After a brief lunch break, I followed with a presentation of Phenoscape, starting with an overview of the curation process and how we coded character matrices into EQ statements. After showing a few screen shots from Phenex and figures from my ICBO poster, I finished with a list of challenges for integrating behavior into OBO and showed a slide of the Biological Process Tree from GO. After describing the curation process, Emilia Martins, who I was very happy to see attend, asked about the getting the process of curating publications for Phenoscape out to individual authors. Emilia is concerned that not distributing the work might be the eventual kiss of death to any annotation process. Although that doesn't seem to be necessarily the case in other annotation projects, I have heard that ZFIN has recently had some difficulty keeping up with the rising flow of Zebrafish papers they selected to curate.

I've been trying to get OBO to acknowledge the existence of Ethosource and the ABO ontology for a while now. MGI sent Sue Bello, a mouse phenotype curator, to the meeting. This was an important step and her presentation gave everyone a sense of how ontologies are actually used in at least some model organism projects. I say some because, unlike the PATO phenotype ontology used by Phenoscape and ZFIN, MGI and (I believe RGD) use a precomposed trait ontology (Mammalian Phenotype Ontology), rather than building postcompositions of ontology terms.

Emilia discussed Ethobank's status; it was built from a collection of media material from an archive of lizard displays. It is no longer online due to some security issues. Although Indiana University has offered unlimited raw storage space, ongoing funding would still be required for database maintenance and curation. Emilia also observed that the behavior community lacks a tradition of combining data across investigations.

We then started discussing issues that could come out of the meeting. One of the first was the status of the ABO core ontology. There are currently at least four versions floating around: the one posted at ethodata.org, which is the official product of the Cornell workshops, the one edited by Anne and Sue for Ethosearch, the OWL conversion (but no additions) I have distributed with OwlWatcher and dropped in OwlWatcher's sourceforge site, and the version developed by David Shotton and his student, which includes independent revisions and term definitions. We discussed putting ABO on sourceforge and adopting a term tracking and curation approach like that adopted by several OBO projects.

We also thought of identifying several key projects that would serve as exemplars (case studies) for ethobank, and felt that these should include zoo projects, as zoo data may be more easily comparable than data across multiple wild populations.

The role of educational outreach, particularly at the K-12 level was discussed, both collection and analysis (e.g., bringing data together from multiple student observers). Traditional citizen science, as run by the Cornell Lab of Ornithology is more focused on collection than analysis, though the data is generally available, if members of the public knew what they wanted to do with it.

We discussed goals at various time scales, which were elaborated on Sunday morning.

Although the current entry tool does not support multiple parents from terms in the ontology, there was a consensus that many descriptions will require multiple parents to capture correctly. Although it might have been fun to introduce the notions of intersection and restriction from OWL at this point, Ethosearch and ABO have a ways to go before these would be meaningful.

We also saw demos of two search tools for ethosource database that were developed by Weiyi Meng and his student Jiang Yu. These are used both to assist contributors in categorizing their submissions and for clustering and mining in a way that suggests they would be applicable to comparative analysis, at least at the level of bringing relevant paragraphs from different ethograms together.


On Sunday, we went over our goals and what we would be doing. This involved continuing to serve as conduits to our respective projects, getting the word out to wider communities (e.g., ABS, ISBE, and the 2011 IEC). Likewise, there was discussion of K-12 outreach and stepping up the collaboration with Lincoln Park Zoo and the wider zoo community. The need for more work on the editoral review process was also discussed, both in terms of mechanics, but also whether there was a way to make it sustainable. Cyndy Parr gave us more details on EoL funding that might be relevant to supporting curation of material from existing collections. It was clear that continuing to build the bridge with Macaulay Library would be a near term priority.

After the formal break up of the meeting, Sue Bello gave Anne a brief walk through of OBO-Edit and the Mammalian Trait Ontology. There was interest on both sides in having Anne and her students review the behavior portion of this. I also passed on some material relating to the CARO anatomy ontology as an example of what a common phenotype (e.g., for behavior) might look like in OBO.

OwlWatcher is broken, and what will happen to fix it

Summary

The video support in OwlWatcher for OSX is broken, in both Leopard and Snow Leopard. As far as I can tell, it still works in Windows. I am working on a fix for the problem which will require use of a different video player (Quicktime support in Java is gone and Apple seems to have no interest in bringing it back). I am investigating alternative player frameworks, some based on JMF, some on wrappers for FFMPEG, and some which are both. I expect the result of this will be a more complex installation process, at least for OSX, but it will also open up the possibility of using OwlWatcher on Linux (something I've been wanting for a while now).

Rant
Apple's support for Quicktime in java has always been rather spotty, and developers have been burned by Java upgrades which broke Quicktime before. I investigated alternative Quicktime bindings this week (Rococoa), but those seem to be broken in Snow Leopard as well. Although I expect the Rococoa developer(s) will eventually try to address this, there are alternatives to Quicktime, especially for this application, so it is time to finally make OwlWatcher independent of Quicktime.

I should have done this a while ago, and I apologize for anyone who has been inconvenienced by this. I do not blame Apple for OwlWatcher breaking, I was expecting something like this to happen. However, I should point out that there was apparently a Java update from Apple that also broke Quicktime support in 10.5 (Leopard) and I'm rather disappointed that things broke even without an upgrade to SnowLeopard.

It was suggested that I consider an alternative to Java. Unfortunately, I don't think reimplementing this in Python (which seems more to my taste than Perl, though I've written a bit more of the latter) would necessarily avoid this sort of breakage. Apple seemed to be promoting Python and Perl as having Cocoa support in 10.5, but, looking at the new X-code IDE for Snow Leopard, their enthusiasm for these languages has waned. Unfortunately, it seems at the moment that some of the most interesting work in scripting languages is going on with languages like Scala and Clojure, which are also jvm-based. So much the worse for Apple, though I'm sure it won't affect the sale of iPhones or the development of applications, which seems to be where Apple's focus increasingly lies.