Gene Ontology Annotation Tool (GOAT)

Gene Ontology Annotation Tool
(GOAT)

Home Background Example Reference and Resources

Approach

GOAT is closely related to another project at the University of Manchester named GONG (Gene Ontology Next Generation). The goal of GONG is to convert the present GO into a description-logic-based ontology (specifically, in DAML+OIL) and then to further enrich it with formally represented biological knowledge. Our DAML+OIL version of GO is loaded into FaCT, a classifier of description-logic-based ontologies, allowing us to reason easily with its component terms.

We also make use of an instance store to hold associations between GO terms. While each concept (i.e., class) of our DAML+OIL GO ontology represents a GO term, each instance of the instance store is an association record for a corresponding GO-term concept in the ontology. Each association record refers to its corresponding GO term and to the set of other GO terms with which that term is associated. The GO-term-to-GO-term associations were mined from the complete version of GOA (Gene Ontology Annotation), a database holding all GO-term annotations of entries in the databases of UniProt (a comprehensive resource for information about proteins) and Ensembl (a project that maintains information about large genomes). We examined each GO-term annotation in GOA that represents neither an unknown term (e.g., “unknown biological process”) nor an obsolete term and that has an evidence code that we deem reliable. We compiled associations of GO terms in the sense that the two terms that make up each associative pair (i.e., a given GO term and (one of) its associated GO term(s)) have been used together as annotating terms in at least one UniProt entry of GOA.

The second type of these associations is that between GO terms and gene-product types (i.e., types of biological molecules), which were obtained from the various prominent organism-specific databases that use GO terms to annotate their gene-product entries (e.g., the Saccharomyces Genome Database (SGD), which concentrates on the yeast Saccharomyces cerevisiae). The entries of most of these databases do not have structured fields that classify them into gene-product types, and thus, there is no easy way to automatically mine for this type of association. Instead, the databases were manually searched and examined, resulting in a small set of existential restrictions (added directly to our DAML+OIL version of GO) for the most general terms to which each gene-product type was found to be associated. We assumed that proteins can be annotated with almost any GO term and instead concentrated on finding terms associated with other types of molecules (e.g., tRNAs). These types of macromolecules have more restricted functions (and processes and cellular locations) that can be used to pare a given GO subontology down to a more manageable size for presentation to the user.

Contact Mike Bada or Robert Stevens for more information.

Last modified 4 April 2004 by Mike Bada.