 |
Gene Ontology Annotation Tool (GOAT)
|
Home
Background
Example
Reference and Resources
Approach
GOAT is closely related to another project at the University of Manchester named
GONG (Gene Ontology Next Generation). The goal of GONG is to
convert the present GO into a description-logic-based ontology (specifically, in
DAML+OIL) and then to further enrich
it with formally represented biological knowledge. Our DAML+OIL version of GO is loaded into
FaCT, a classifier of
description-logic-based ontologies, allowing us to reason easily with its component terms.
We also make use of an instance store
to hold associations between GO terms. While each concept (i.e., class) of
our DAML+OIL GO ontology represents a GO term, each instance of the instance store is an
association record for a corresponding GO-term concept in the ontology. Each association record
refers to its corresponding GO term and to the set of other GO terms with which that term is
associated. The GO-term-to-GO-term associations were mined from the complete version of
GOA (Gene Ontology Annotation), a database holding all
GO-term annotations of entries in the databases of
UniProt (a comprehensive resource for
information about proteins) and Ensembl (a project that
maintains information about large genomes). We examined each GO-term annotation in GOA that
represents neither an unknown term (e.g., “unknown biological process”) nor an obsolete
term and that has an evidence code that we deem reliable. We compiled associations of GO terms
in the sense that the two terms that make up each associative pair (i.e., a
given GO term and (one of) its associated GO term(s)) have been used together as annotating terms
in at least one UniProt entry of GOA.
The second type of these associations is that between GO terms and gene-product
types (i.e., types of biological molecules), which were obtained from the
various prominent organism-specific databases that use GO terms to annotate their gene-product
entries (e.g., the
Saccharomyces Genome Database (SGD), which
concentrates on the yeast Saccharomyces cerevisiae). The entries of most of these
databases do not have structured fields that classify them into gene-product types, and thus,
there is no easy way to automatically mine for this type of association. Instead, the databases
were manually searched and examined, resulting in a small set of existential restrictions (added
directly to our DAML+OIL version of GO) for the most general terms to which each gene-product type
was found to be associated. We assumed that proteins can be annotated with almost any GO term
and instead concentrated on finding terms associated with other types of molecules
(e.g., tRNAs). These types of macromolecules have more restricted functions
(and processes and cellular locations) that can be used to pare a given GO subontology down to a
more manageable size for presentation to the user.
Contact Mike Bada or
Robert Stevens for more information.
Last modified 4 April 2004 by Mike Bada.