|
TGD General Tutorial
Welcome to the Tetrahymena Genome Database online tutorial. Here we show
how to use many of the tools available at our website by following a
typical path of inquiry about a Tetrahymena gene. More in-depth help
documents will be available throughout the site to aid with specific
questions about methods, displays, and references associated with each
of the tools described here.
Where to begin?
What gene strikes your fancy? There are over 27,000 protein-coding
genes predicted in the T. thermophila genome, and only about
250 of them have information published specifically about them.
Tetrahymena's powerful genetics, fascinating biology, and distinct
evolutionary position ensure an interesting story exists for
almost any gene. To give this tutorial more practical feel, I'll
focus on a set of genes that sparked my curiosity while we were putting TGD's automatic annotations together.
To begin, open a new browser window to www.ciliate.org (so you can follow the tutorial
in this window) and head to the Locus Page for the myosin gene MYO1. To do this,
simply enter MYO1 into the Quick Search box at the left and click 'Submit'. Since MYO1
is a unique gene name in TGD, the Gene Names field on the Quick
Seach Results page links directly to the MYO1 Locus Page:
The Locus Page is the portal to all information in
TGD related to a particular gene: literature, sequences, protein
annotations, its genomic location, and more. If information has been
published about a gene, the page will be split into two halves. This
is true for MYO1, although the image is too large to show in the
picture above. The upper half contains annotation TGD curators have attributed to the
gene as it has been described in the literature. The lower half
contains TGD's automatic annotation of the preliminary TIGR gene model that we
have determined corresponds to the published gene. Since conflicts
may exist between the sequence described for the published gene and
the TIGR gene model, TGD is not integrating these two halves of the
page at this time. When the TIGR gene models have been updated, TGD will begin to reconcile
differences between the published and predicted sequences, so that we
can integrate the information on these pages.
GBrowse: TGD's genome browser
One of the first things to notice in the upper half of the MYO1 Locus Page is a large map of a region of the Tetrahymena
genome. The map is centered on the TIGR gene
model that TGD curators have determined corresponds to the MYO1
gene. Click anywhere on the map to enter GBrowse, a genome
browser utility provided by the Generic Model Organism Database project
(GMOD):
The GBrowse Page shows an expanded map of the region containing MYO1's
corresponding gene model, 8.m00317 (PreTt27057, see note below). TIGR assigned sequential numbers to each
of the May 4, 2004 preliminary gene models, from 1001..30400. TGD has
created Locus Pages for each of the 27,400 gene models that contains one or
more exons. To reinforce the preliminary nature of these gene models,
we have named them with the "Pre(liminary)Tt" prefix when presented at our site.
Please do not refer to this number in any publications, since these
names will be removed from TGD when TIGR releases its final genome
sequence and gene models.
(Note: On June 9, 2005, we updated the names of
the preliminary gene models to correspond with the names shown by
TIGR at their website. For example, "PreTt27057" is now identified as
"8.m00317", and "PreTt27120" is known as "8.m00380". This tutorial
was written prior to this change. We have updated the names in the
rest of this document. The screenshots shown have not been updated,
however, and the exact information shown in them is slightly out-of-date.)
The large GBrowse display shows features present in the region of
the genome highlighted in the Overview pane, located at the top of the main window. You can select the kinds of information shown in
the window by selecting or de-selecting dataset "tracks" below the
display. If they are not already on by default, select the "Tetrahymena GenBank Multiple Hits to Genome
(BLAT)" and "Tetrahymena GenBank Unique Hits to Genome (BLAT)" tracks,
then click "Update Image" to see what published genes have been found in this region. You should
see a red bar corresponding to the aligned region between the genome
sequence and GenBank ID U87268 (MYO1). The bars in these tracks are color-coded
according to the P-value score returned by a BLAT search performed by
TGD. Red indicates the strongest hit, as seen in the key below the
display.
Above the main window, you can see that the currently viewed region is
present on genome scaffold CH445735. The entire sequence of this
1.2 Mb scaffold is available at NCBI, and you can use this ID# to search the
NCBI Nucleotide database if
you would like to view it. The currently viewed GBrowse region is a 7.53 kb
stretch near the beginning of the scaffold, as shown in the the Scroll/Zoom menu and the Overview track. To browse the genomic
surroundings of 8.m00317, select "100 kbp" from the Scroll/Zoom
pulldown. The browser will automatically zoom out, remaining centered
on 8.m00317.
While the views returned are browser specific, many of
you will see automatic annotations for the many gene models in the
vicinity of MYO1. Also notice that the gene models are numbered
sequentially, left to right in this view, regardless of whether the
gene is encoded on the top strand (arrow on gene model pointing right) or
bottom strand (pointing left).
To search to the right of the MYO1 gene, click the single right
arrow in the Scroll/Zoom tool. This will move you only 50 kbp, centering the next view on the
region currently at the right-most edge of the screen. Notice that
another published gene now shows in the window - telomerase reverse
transcriptase (TERT), gene model 8.m00330. Neat - I wonder if
anything else interesting lies in that direction? Let's see. Click
the double arrow to scroll a full 100 kbp to the right. My
browser no longer shows the annotations, but yours might. I can,
however, mouse-over the gene model icons to see what the gene model number
is, along with its coordinates. Continue to scroll to the right until
you reach 8.m00380. Resist the urge to click directly on the
8.m00380 gene model right now, as this will take you directly to the
8.m00380 Locus Page. After scrolling into the region containing
this gene, center your view on 8.m00380 by clicking the ruler at the top of the
window, underneath the Overview window. 8.m00380 is found at the
350 kb marker.
Now it's time to see why I brought you all the way out here. Zoom in to "Show 40 kbp"
using the Scroll/Zoom tool.
The annotations for 8.m00380 and its
neighbors should now be showing. This region of the genome appears to
contain five 'chitin synthase'-related genes. Four of these are adjacent to one
another: 8.m00377, 8.m00378, 8.m00379, and 8.m00380. The last
one, 8.m00383, is only two genes away from this cluster. All five
of the putative chitin synthase genes are encoded on the lower strand and are roughly the
same size, about 2 kbp.
Drawing on my New Mexico Tech undergraduate education, I remember that chitin is a
rigid polymer used to make two things: the cell walls in fungi, and the
exoskeleton of arthropods. I'm not sure what use ciliates might have
for chitin - they don't typically have cell walls or exoskeletons, as
far as I know - and during
the few years I spent working on Oxytricha I don't recall
any discussions about chitin. Does that mean chitin has never been studied in
ciliates? Not necessarily, and I'll show you how to use TGD to find
out if it has been in
a moment. First though, let's see if we can find out a little more
about these genes before we get too excited.
Returning to GBrowse, scroll your browser down to the "Tracks" section
of the page. TGD has mapped all available Tetrahymena EST sequences to the genome, and
you can choose to view these matches in the GBrowse window by
selecting the "Tetrahymena EST Multiple Hits to Genome (BLAT)" and
"Tetrahymena EST Unique Hits to Genome (BLAT)" tracks, then clicking
"Update Image". At the moment Tetrahymena has a modest set of ESTs,
but as luck would have it, two of the putative chitin synthase genes, 8.m00380 and
8.m00383, have hits in this dataset. Though these predicted gene models may or may not be perfect representations of the coding
sequences expressed in this area, it appears that at least parts of 8.m00380 and
8.m00383 are transcribed. Hooray!
Locus Page
Now that we've identified 8.m00380 as an interesting, expressed gene using GBrowse, let's check to
see if the function predicted for this gene model is likely to be accurate.
The descriptions presented for the gene models in GBrowse were
assigned based on the gene's predicted domain composition. If no
domains were identified for that gene in our analyses, the gene's
highest BLAST hit to the UniRef90 protein database is shown. Our threshold score for
showing a BLAST annotation was very inclusive, so make sure
to examine any gene with an enticing description in GBrowse more closely. To
start our examination of the putative chitin synthase gene
8.m00380, visit its Locus Page by clicking on the 8.m00380 gene
model in the "Gene Predictions" track of GBrowse. (Or access the 8.m00380
Locus Page directly.)
Notice first that, unlike the MYO1 Locus Page we visited earlier, the
8.m00380 Locus Page begins at the dark blue "Provisional Annoatation"
section. No papers or non-EST GenBank entries have been published on
this gene, so TGD has not created a light blue "Curated Information" section for
it yet.
Near the top of the page, the Homolog section shows 8.m00380's top three
BLAST hits against the UniRef90 database. UniRef90 is a dataset
created by clustering all protein sequences at UniProt with greater than 90%
identity into a single entry. BLASTing against this database takes a
fraction of the time it would take to compare all 27,400 gene models
to the full set of proteins in the UniProt database. It took two
weeks to complete the BLAST searches used to generate the list of
homologs shown at TGD.
If you're familiar with BLAST scores, you'll notice that the top hit
for 8.m00380 isn't too bad: an E-value of 3.9e-38 to a cluster of chitin
synthase genes. Click through to the UniRef90_Q6BT33 entry to read more about
this cluster. Looks like it's a "cluster" comprised of a single
chitin synthase gene, from the species Debaryomyces hansenii.
It also appears that this description was assigned based on similarity
to a gene in Candida rather than any work done in
Debaryomyces. That's okay - if the similarity between the
Tetrahymena and Debaryomyces genes is 3.9e-38, just think how high it
must be between Debaryomyces and Candida.
Returning to the 8.m00380 Locus Page, let's take a closer look at
the Homolog table. In addition to the E-value, we can see that only
about 18% of the putative protein shares identity with the chitin
synthase genes in the list. More specifically, the middle of the protein
(from about amino acids 400-600) hits a similar region in its
chitin synthase homologs. The fact that only a part of this rather large protein shows identity
to the chitin synthase is a little disappointing. But we won't let
that get us down. Scroll to the bottom of the page and select ORF
Translation from the Retrieve Sequences section. We'll do our own
BLAST search. Select the sequence and run a BLASTP with it at the NCBI
BLAST server. Here are the results from when I did this:
Score! Looks like there are regions of similarity to other chitin synthases
throughout the protein. No more worries - this is looking like a
solid homolog of these other proteins. And better yet, the third-best hit is to a gene in
Saccharomyces cerevisiae, CHS1. The yeast community has
a terrific website where you can access information about all the
genes in their organism, called the Saccharomyces Genome Database
(SGD). Visit the CHS1
Locus Page at SGD to read about studies done on this gene in
yeast, and to get an idea of the kinds of
information you can access from a mature, popular model organism
database. Right on the CHS1 Locus Page you can read interesting facts
that may also pertain to our 8.m00380 gene in Tetrahymena. Chs1p is
expressed as a zymogen that has to be cleaved prior to activation, and
it's found in a structure called the
"chitosome". Interesting leads, for sure.
One word of warning: even though Chs1p is involved in producing
chitin in Saccharomyces and other fungi, a Tetrahymena homolog of this
protein may serve a slightly different purpose. Always look
carefully to see if homologs of your best hits play roles in multiple
pathways: the best studied, most obvious functions in one organism may
be less significant, or even absent, in another. The core activity of
chitin synthase enzymes, for example, is to attach N-acetyl-D-glucosamine to things.
Perhaps ciliates have a different reason to do this, rather than to
make a chitin polymer.
|