TGD Home Tetrahymena Genome Database


Quick Search:

Genome Resources
Genome Browser
BLAST & BLAT
Tetrahymena
Literature
Tetrahymena
Biology
Community
Information
Stock Center at Cornell
Tutorial
Help
About TGD
Home

TGD General Tutorial

Welcome to the Tetrahymena Genome Database online tutorial. Here we show how to use many of the tools available at our website by following a typical path of inquiry about a Tetrahymena gene. More in-depth help documents will be available throughout the site to aid with specific questions about methods, displays, and references associated with each of the tools described here.

Where to begin?

What gene strikes your fancy? There are over 27,000 protein-coding genes predicted in the T. thermophila genome, and only about 250 of them have information published specifically about them. Tetrahymena's powerful genetics, fascinating biology, and distinct evolutionary position ensure an interesting story exists for almost any gene. To give this tutorial more practical feel, I'll focus on a set of genes that sparked my curiosity while we were putting TGD's automatic annotations together.

To begin, open a new browser window to www.ciliate.org (so you can follow the tutorial in this window) and head to the Locus Page for the myosin gene MYO1. To do this, simply enter MYO1 into the Quick Search box at the left and click 'Submit'. Since MYO1 is a unique gene name in TGD, the Gene Names field on the Quick Seach Results page links directly to the MYO1 Locus Page:

The Locus Page is the portal to all information in TGD related to a particular gene: literature, sequences, protein annotations, its genomic location, and more. If information has been published about a gene, the page will be split into two halves. This is true for MYO1, although the image is too large to show in the picture above. The upper half contains annotation TGD curators have attributed to the gene as it has been described in the literature. The lower half contains TGD's automatic annotation of the preliminary TIGR gene model that we have determined corresponds to the published gene. Since conflicts may exist between the sequence described for the published gene and the TIGR gene model, TGD is not integrating these two halves of the page at this time. When the TIGR gene models have been updated, TGD will begin to reconcile differences between the published and predicted sequences, so that we can integrate the information on these pages.

GBrowse: TGD's genome browser

One of the first things to notice in the upper half of the MYO1 Locus Page is a large map of a region of the Tetrahymena genome. The map is centered on the TIGR gene model that TGD curators have determined corresponds to the MYO1 gene. Click anywhere on the map to enter GBrowse, a genome browser utility provided by the Generic Model Organism Database project (GMOD):

The GBrowse Page shows an expanded map of the region containing MYO1's corresponding gene model, 8.m00317 (PreTt27057, see note below). TIGR assigned sequential numbers to each of the May 4, 2004 preliminary gene models, from 1001..30400. TGD has created Locus Pages for each of the 27,400 gene models that contains one or more exons. To reinforce the preliminary nature of these gene models, we have named them with the "Pre(liminary)Tt" prefix when presented at our site. Please do not refer to this number in any publications, since these names will be removed from TGD when TIGR releases its final genome sequence and gene models.

(Note: On June 9, 2005, we updated the names of the preliminary gene models to correspond with the names shown by TIGR at their website. For example, "PreTt27057" is now identified as "8.m00317", and "PreTt27120" is known as "8.m00380". This tutorial was written prior to this change. We have updated the names in the rest of this document. The screenshots shown have not been updated, however, and the exact information shown in them is slightly out-of-date.)

The large GBrowse display shows features present in the region of the genome highlighted in the Overview pane, located at the top of the main window. You can select the kinds of information shown in the window by selecting or de-selecting dataset "tracks" below the display. If they are not already on by default, select the "Tetrahymena GenBank Multiple Hits to Genome (BLAT)" and "Tetrahymena GenBank Unique Hits to Genome (BLAT)" tracks, then click "Update Image" to see what published genes have been found in this region. You should see a red bar corresponding to the aligned region between the genome sequence and GenBank ID U87268 (MYO1). The bars in these tracks are color-coded according to the P-value score returned by a BLAT search performed by TGD. Red indicates the strongest hit, as seen in the key below the display.

Above the main window, you can see that the currently viewed region is present on genome scaffold CH445735. The entire sequence of this 1.2 Mb scaffold is available at NCBI, and you can use this ID# to search the NCBI Nucleotide database if you would like to view it. The currently viewed GBrowse region is a 7.53 kb stretch near the beginning of the scaffold, as shown in the the Scroll/Zoom menu and the Overview track. To browse the genomic surroundings of 8.m00317, select "100 kbp" from the Scroll/Zoom pulldown. The browser will automatically zoom out, remaining centered on 8.m00317.

While the views returned are browser specific, many of you will see automatic annotations for the many gene models in the vicinity of MYO1. Also notice that the gene models are numbered sequentially, left to right in this view, regardless of whether the gene is encoded on the top strand (arrow on gene model pointing right) or bottom strand (pointing left).

To search to the right of the MYO1 gene, click the single right arrow in the Scroll/Zoom tool. This will move you only 50 kbp, centering the next view on the region currently at the right-most edge of the screen. Notice that another published gene now shows in the window - telomerase reverse transcriptase (TERT), gene model 8.m00330. Neat - I wonder if anything else interesting lies in that direction? Let's see. Click the double arrow to scroll a full 100 kbp to the right. My browser no longer shows the annotations, but yours might. I can, however, mouse-over the gene model icons to see what the gene model number is, along with its coordinates. Continue to scroll to the right until you reach 8.m00380. Resist the urge to click directly on the 8.m00380 gene model right now, as this will take you directly to the 8.m00380 Locus Page. After scrolling into the region containing this gene, center your view on 8.m00380 by clicking the ruler at the top of the window, underneath the Overview window. 8.m00380 is found at the 350 kb marker.

Now it's time to see why I brought you all the way out here. Zoom in to "Show 40 kbp" using the Scroll/Zoom tool.

The annotations for 8.m00380 and its neighbors should now be showing. This region of the genome appears to contain five 'chitin synthase'-related genes. Four of these are adjacent to one another: 8.m00377, 8.m00378, 8.m00379, and 8.m00380. The last one, 8.m00383, is only two genes away from this cluster. All five of the putative chitin synthase genes are encoded on the lower strand and are roughly the same size, about 2 kbp.

Drawing on my New Mexico Tech undergraduate education, I remember that chitin is a rigid polymer used to make two things: the cell walls in fungi, and the exoskeleton of arthropods. I'm not sure what use ciliates might have for chitin - they don't typically have cell walls or exoskeletons, as far as I know - and during the few years I spent working on Oxytricha I don't recall any discussions about chitin. Does that mean chitin has never been studied in ciliates? Not necessarily, and I'll show you how to use TGD to find out if it has been in a moment. First though, let's see if we can find out a little more about these genes before we get too excited.

Returning to GBrowse, scroll your browser down to the "Tracks" section of the page. TGD has mapped all available Tetrahymena EST sequences to the genome, and you can choose to view these matches in the GBrowse window by selecting the "Tetrahymena EST Multiple Hits to Genome (BLAT)" and "Tetrahymena EST Unique Hits to Genome (BLAT)" tracks, then clicking "Update Image". At the moment Tetrahymena has a modest set of ESTs, but as luck would have it, two of the putative chitin synthase genes, 8.m00380 and 8.m00383, have hits in this dataset. Though these predicted gene models may or may not be perfect representations of the coding sequences expressed in this area, it appears that at least parts of 8.m00380 and 8.m00383 are transcribed. Hooray!

Locus Page

Now that we've identified 8.m00380 as an interesting, expressed gene using GBrowse, let's check to see if the function predicted for this gene model is likely to be accurate. The descriptions presented for the gene models in GBrowse were assigned based on the gene's predicted domain composition. If no domains were identified for that gene in our analyses, the gene's highest BLAST hit to the UniRef90 protein database is shown. Our threshold score for showing a BLAST annotation was very inclusive, so make sure to examine any gene with an enticing description in GBrowse more closely. To start our examination of the putative chitin synthase gene 8.m00380, visit its Locus Page by clicking on the 8.m00380 gene model in the "Gene Predictions" track of GBrowse. (Or access the 8.m00380 Locus Page directly.)

Notice first that, unlike the MYO1 Locus Page we visited earlier, the 8.m00380 Locus Page begins at the dark blue "Provisional Annoatation" section. No papers or non-EST GenBank entries have been published on this gene, so TGD has not created a light blue "Curated Information" section for it yet.

Near the top of the page, the Homolog section shows 8.m00380's top three BLAST hits against the UniRef90 database. UniRef90 is a dataset created by clustering all protein sequences at UniProt with greater than 90% identity into a single entry. BLASTing against this database takes a fraction of the time it would take to compare all 27,400 gene models to the full set of proteins in the UniProt database. It took two weeks to complete the BLAST searches used to generate the list of homologs shown at TGD.

If you're familiar with BLAST scores, you'll notice that the top hit for 8.m00380 isn't too bad: an E-value of 3.9e-38 to a cluster of chitin synthase genes. Click through to the UniRef90_Q6BT33 entry to read more about this cluster. Looks like it's a "cluster" comprised of a single chitin synthase gene, from the species Debaryomyces hansenii. It also appears that this description was assigned based on similarity to a gene in Candida rather than any work done in Debaryomyces. That's okay - if the similarity between the Tetrahymena and Debaryomyces genes is 3.9e-38, just think how high it must be between Debaryomyces and Candida.

Returning to the 8.m00380 Locus Page, let's take a closer look at the Homolog table. In addition to the E-value, we can see that only about 18% of the putative protein shares identity with the chitin synthase genes in the list. More specifically, the middle of the protein (from about amino acids 400-600) hits a similar region in its chitin synthase homologs. The fact that only a part of this rather large protein shows identity to the chitin synthase is a little disappointing. But we won't let that get us down. Scroll to the bottom of the page and select ORF Translation from the Retrieve Sequences section. We'll do our own BLAST search. Select the sequence and run a BLASTP with it at the NCBI BLAST server. Here are the results from when I did this:

Score! Looks like there are regions of similarity to other chitin synthases throughout the protein. No more worries - this is looking like a solid homolog of these other proteins. And better yet, the third-best hit is to a gene in Saccharomyces cerevisiae, CHS1. The yeast community has a terrific website where you can access information about all the genes in their organism, called the Saccharomyces Genome Database (SGD). Visit the CHS1 Locus Page at SGD to read about studies done on this gene in yeast, and to get an idea of the kinds of information you can access from a mature, popular model organism database. Right on the CHS1 Locus Page you can read interesting facts that may also pertain to our 8.m00380 gene in Tetrahymena. Chs1p is expressed as a zymogen that has to be cleaved prior to activation, and it's found in a structure called the "chitosome". Interesting leads, for sure.

One word of warning: even though Chs1p is involved in producing chitin in Saccharomyces and other fungi, a Tetrahymena homolog of this protein may serve a slightly different purpose. Always look carefully to see if homologs of your best hits play roles in multiple pathways: the best studied, most obvious functions in one organism may be less significant, or even absent, in another. The core activity of chitin synthase enzymes, for example, is to attach N-acetyl-D-glucosamine to things. Perhaps ciliates have a different reason to do this, rather than to make a chitin polymer.



To contact TGD: Send email to ciliate-curator@genome.stanford.edu.
Return to TGD Home