TGD Home Tetrahymena Genome Database


Quick Search:

Genome Resources
Genome Browser
BLAST & BLAT
Textpresso
(Full-Text Search)
Tetrahymena
Literature
Tetrahymena
Biology
Community
Information
Stock Center at Cornell
Tutorial
Help
About TGD
Home

TGD General Tutorial

Welcome to the Tetrahymena Genome Database online tutorial. Here we show how to use many of the tools available at our website by following a typical path of inquiry about a Tetrahymena gene. More in-depth help documents will be available throughout the site to aid with specific questions about methods, displays, and references associated with each of the tools described here.

Where to begin?

What gene strikes your fancy? There are over 27,000 protein-coding genes predicted in the T. thermophila genome, and only about 250 of them have information published specifically about them. Tetrahymena's powerful genetics, fascinating biology, and distinct evolutionary position ensure an interesting story exists for almost any gene. To give this tutorial more practical feel, I'll focus on a set of genes that sparked my curiosity while we were putting TGD's automatic annotations together.

To begin, open a new browser window to www.ciliate.org (so you can follow the tutorial in this window) and head to the Locus Page for the myosin gene MYO1. To do this, simply enter MYO1 into the Quick Search box at the left and click 'Submit'. Since MYO1 is a unique gene name in TGD, the Gene Names field on the Quick Seach Results page links directly to the MYO1 Locus Page:

The Locus Page is the portal to all information in TGD related to a particular gene: literature, sequences, protein annotations, its genomic location, and more. If information has been published about a gene, the page will be split into two halves. This is true for MYO1, although the image is too large to show in the picture above. The upper half contains annotation TGD curators have attributed to the gene as it has been described in the literature. The lower half contains TGD's automatic annotation of the preliminary TIGR gene model that we have determined corresponds to the published gene. Since conflicts may exist between the sequence described for the published gene and the TIGR gene model, TGD is not integrating these two halves of the page at this time. When the TIGR gene models have been updated, TGD will begin to reconcile differences between the published and predicted sequences, so that we can integrate the information on these pages.

GBrowse: TGD's genome browser

One of the first things to notice in the upper half of the MYO1 Locus Page is a large map of a region of the Tetrahymena genome. The map is centered on the TIGR gene model that TGD curators have determined corresponds to the MYO1 gene. Click anywhere on the map to enter GBrowse, a genome browser utility provided by the Generic Model Organism Database project (GMOD):

The GBrowse Page shows an expanded map of the region containing MYO1's corresponding gene model, 8.m00317 (PreTt27057, see note below). TIGR assigned sequential numbers to each of the May 4, 2004 preliminary gene models, from 1001..30400. TGD has created Locus Pages for each of the 27,400 gene models that contains one or more exons. To reinforce the preliminary nature of these gene models, we have named them with the "Pre(liminary)Tt" prefix when presented at our site. Please do not refer to this number in any publications, since these names will be removed from TGD when TIGR releases its final genome sequence and gene models.

(Note: On June 9, 2005, we updated the names of the preliminary gene models to correspond with the names shown by TIGR at their website. For example, "PreTt27057" is now identified as "8.m00317", and "PreTt27120" is known as "8.m00380". This tutorial was written prior to this change. We have updated the names in the rest of this document. The screenshots shown have not been updated, however, and the exact information shown in them is slightly out-of-date.)

The large GBrowse display shows features present in the region of the genome highlighted in the Overview pane, located at the top of the main window. You can select the kinds of information shown in the window by selecting or de-selecting dataset "tracks" below the display. If they are not already on by default, select the "Tetrahymena GenBank Multiple Hits to Genome (BLAT)" and "Tetrahymena GenBank Unique Hits to Genome (BLAT)" tracks, then click "Update Image" to see what published genes have been found in this region. You should see a red bar corresponding to the aligned region between the genome sequence and GenBank ID U87268 (MYO1). The bars in these tracks are color-coded according to the P-value score returned by a BLAT search performed by TGD. Red indicates the strongest hit, as seen in the key below the display.

Above the main window, you can see that the currently viewed region is present on genome scaffold CH445735. The entire sequence of this 1.2 Mb scaffold is available at NCBI, and you can use this ID# to search the NCBI Nucleotide database if you would like to view it. The currently viewed GBrowse region is a 7.53 kb stretch near the beginning of the scaffold, as shown in the the Scroll/Zoom menu and the Overview track. To browse the genomic surroundings of 8.m00317, select "100 kbp" from the Scroll/Zoom pulldown. The browser will automatically zoom out, remaining centered on 8.m00317.

While the views returned are browser specific, many of you will see automatic annotations for the many gene models in the vicinity of MYO1. Also notice that the gene models are numbered sequentially, left to right in this view, regardless of whether the gene is encoded on the top strand (arrow on gene model pointing right) or bottom strand (pointing left).

To search to the right of the MYO1 gene, click the single right arrow in the Scroll/Zoom tool. This will move you only 50 kbp, centering the next view on the region currently at the right-most edge of the screen. Notice that another published gene now shows in the window - telomerase reverse transcriptase (TERT), gene model 8.m00330. Neat - I wonder if anything else interesting lies in that direction? Let's see. Click the double arrow to scroll a full 100 kbp to the right. My browser no longer shows the annotations, but yours might. I can, however, mouse-over the gene model icons to see what the gene model number is, along with its coordinates. Continue to scroll to the right until you reach 8.m00380. Resist the urge to click directly on the 8.m00380 gene model right now, as this will take you directly to the 8.m00380 Locus Page. After scrolling into the region containing this gene, center your view on 8.m00380 by clicking the ruler at the top of the window, underneath the Overview window. 8.m00380 is found at the 350 kb marker.

Now it's time to see why I brought you all the way out here. Zoom in to "Show 40 kbp" using the Scroll/Zoom tool.

The annotations for 8.m00380 and its neighbors should now be showing. This region of the genome appears to contain five 'chitin synthase'-related genes. Four of these are adjacent to one another: 8.m00377, 8.m00378, 8.m00379, and 8.m00380. The last one, 8.m00383, is only two genes away from this cluster. All five of the putative chitin synthase genes are encoded on the lower strand and are roughly the same size, about 2 kbp.

Drawing on my New Mexico Tech undergraduate education, I remember that chitin is a rigid polymer used to make two things: the cell walls in fungi, and the exoskeleton of arthropods. I'm not sure what use ciliates might have for chitin - they don't typically have cell walls or exoskeletons, as far as I know - and during the few years I spent working on Oxytricha I don't recall any discussions about chitin. Does that mean chitin has never been studied in ciliates? Not necessarily, and I'll show you how to use TGD to find out if it has been in a moment. First though, let's see if we can find out a little more about these genes before we get too excited.

Returning to GBrowse, scroll your browser down to the "Tracks" section of the page. TGD has mapped all available Tetrahymena EST sequences to the genome, and you can choose to view these matches in the GBrowse window by selecting the "Tetrahymena EST Multiple Hits to Genome (BLAT)" and "Tetrahymena EST Unique Hits to Genome (BLAT)" tracks, then clicking "Update Image". At the moment Tetrahymena has a modest set of ESTs, but as luck would have it, two of the putative chitin synthase genes, 8.m00380 and 8.m00383, have hits in this dataset. Though these predicted gene models may or may not be perfect representations of the coding sequences expressed in this area, it appears that at least parts of 8.m00380 and 8.m00383 are transcribed. Hooray!

Locus Page

Now that we've identified 8.m00380 as an interesting, expressed gene using GBrowse, let's check to see if the function predicted for this gene model is likely to be accurate. The descriptions presented for the gene models in GBrowse were assigned based on the gene's predicted domain composition. If no domains were identified for that gene in our analyses, the gene's highest BLAST hit to the UniRef90 protein database is shown. Our threshold score for showing a BLAST annotation was very inclusive, so make sure to examine any gene with an enticing description in GBrowse more closely. To start our examination of the putative chitin synthase gene 8.m00380, visit its Locus Page by clicking on the 8.m00380 gene model in the "Gene Predictions" track of GBrowse. (Or access the 8.m00380 Locus Page directly.)

Notice first that, unlike the MYO1 Locus Page we visited earlier, the 8.m00380 Locus Page begins at the dark blue "Provisional Annoatation" section. No papers or non-EST GenBank entries have been published on this gene, so TGD has not created a light blue "Curated Information" section for it yet.

Near the top of the page, the Homolog section shows 8.m00380's top three BLAST hits against the UniRef90 database. UniRef90 is a dataset created by clustering all protein sequences at UniProt with greater than 90% identity into a single entry. BLASTing against this database takes a fraction of the time it would take to compare all 27,400 gene models to the full set of proteins in the UniProt database. It took two weeks to complete the BLAST searches used to generate the list of homologs shown at TGD.

If you're familiar with BLAST scores, you'll notice that the top hit for 8.m00380 isn't too bad: an E-value of 3.9e-38 to a cluster of chitin synthase genes. Click through to the UniRef90_Q6BT33 entry to read more about this cluster. Looks like it's a "cluster" comprised of a single chitin synthase gene, from the species Debaryomyces hansenii. It also appears that this description was assigned based on similarity to a gene in Candida rather than any work done in Debaryomyces. That's okay - if the similarity between the Tetrahymena and Debaryomyces genes is 3.9e-38, just think how high it must be between Debaryomyces and Candida.

Returning to the 8.m00380 Locus Page, let's take a closer look at the Homolog table. In addition to the E-value, we can see that only about 18% of the putative protein shares identity with the chitin synthase genes in the list. More specifically, the middle of the protein (from about amino acids 400-600) hits a similar region in its chitin synthase homologs. The fact that only a part of this rather large protein shows identity to the chitin synthase is a little disappointing. But we won't let that get us down. Scroll to the bottom of the page and select ORF Translation from the Retrieve Sequences section. We'll do our own BLAST search. Select the sequence and run a BLASTP with it at the NCBI BLAST server. Here are the results from when I did this:

Score! Looks like there are regions of similarity to other chitin synthases throughout the protein. No more worries - this is looking like a solid homolog of these other proteins. And better yet, the third-best hit is to a gene in Saccharomyces cerevisiae, CHS1. The yeast community has a terrific website where you can access information about all the genes in their organism, called the Saccharomyces Genome Database (SGD). Visit the CHS1 Locus Page at SGD to read about studies done on this gene in yeast, and to get an idea of the kinds of information you can access from a mature, popular model organism database. Right on the CHS1 Locus Page you can read interesting facts that may also pertain to our 8.m00380 gene in Tetrahymena. Chs1p is expressed as a zymogen that has to be cleaved prior to activation, and it's found in a structure called the "chitosome". Interesting leads, for sure.

One word of warning: even though Chs1p is involved in producing chitin in Saccharomyces and other fungi, a Tetrahymena homolog of this protein may serve a slightly different purpose. Always look carefully to see if homologs of your best hits play roles in multiple pathways: the best studied, most obvious functions in one organism may be less significant, or even absent, in another. The core activity of chitin synthase enzymes, for example, is to attach N-acetyl-D-glucosamine to things. Perhaps ciliates have a different reason to do this, rather than to make a chitin polymer.

Full-text Literature Search: Textpresso

From the BLAST searches we just did, it's pretty convincing that the 8.m00380 gene encodes a chitin synthase gene. Furthermore, the top hits were all to fungal genes, suggesting that a chitin synthase has never been sequenced from a ciliate relative of Tetrahymena. While this may be true, chitin itself may have already been discovered in ciliates, in the absence of any gene cloning. Let's now check to see what, if anything, has been published about chitin in Tetrahymena.

TGD has a powerful, convenient tool for searching full text articles, called Textpresso. Textpresso was originally written by Eimear Kenny and Hans-Michael Muller for Wormbase (Caltech) before being distributed to the GMOD Project. TGD has downloaded 800 full-text Tetrahymena research articles, 3200 abstracts, and 5000 titles from the web, all of which can be searched from the Textpresso interface, and we're working to find or gain access to more text in the future.

To check if chitin has been mentioned in the Tetrahymena literature, go to TGD Textpresso and start by entering "chitin" into the search box. Click "Search!" to find all instances of the word "chitin" in the texts contained in TGD.

The results page shows that 7 publications stored in TGD contain information about chitin. Explore the first result (Harold FM, 1990) by clicking its [view sentences] link in the table. Notice that each instance of the word "chitin" is conveniently highlighted in a different color from the surrounding text. Also notice that the first "sentence" seems to be a bit of a jumble - parsing online documents is sometimes tricky.

A careful reading of the results returned for this article shows that most of the instances of the word "chitin" pertain to fungi of some sort. Perhaps it would be more informative to see if the words "chitin" and "Tetrahymena" are mentioned in the same sentence. To perform this search, return to the Textpresso search box and enter "chitin Tetrahymena". For Textpresso, a white space between words is treated as an "AND" expression: this will search for "chitin" and "Tetrahymena" in the same sentence. Since Textpresso searches all the literature stored at TGD, this may return other papers in addition to Harold FM, 1990. That's okay, we can skip to that one if we need to.

Did you get "NO MATCHES FOUND" when you performed the search? I did. Looks like that paper may not contain information on chitin in Tetrahymena. Maybe it contains information on chitin in other ciliates? We know the names of lots of ciliates, but we don't want to type all of them into the search box with chitin. Textpresso has a handy feature that can help with this. Return to the Textpresso search box and remove the word "Tetrahymena", so that only the word "chitin" is left. Then go down to the "Categories to Search" pulldown, select "organism", and hit "Search!". TGD has loaded in the names of all ciliates listed in the Taxonomy Browser at NCBI, plus the names of popular model organisms, into the Organism category. To be exact, the list of species entered into the Organism category at TGD includes all eukaryotic species for which there are at least 2,000 ESTs entered in the dbEST at GenBank, and every prokaryote whose genome has been sequenced. The category also contains a number of common, non-scientific names. Try out the other categories at TGD sometime when you have a chance.

View the 11 sentences returned for the Harold FM, 1990 paper. Each time chitin is mentioned in this paper with an associated organism, it turns out to be a fungal species. Looks like that paper probably doesn't contain any information on chitin in ciliates. A quick check of the sentences returned for other papers in this search (and the previous ones) shows that none of the papers in TGD Textpresso say anything about chitin in ciliates. To be thorough, though, we should probably check other websites to see if chitin has been found in a ciliate. PubMed at NCBI is a very helpful site, as is Google Scholar, a great search engine for finding scientific information on the web. Head to Google Scholar and search for "chitin +ciliate".

There it is. The first article returned (today, anyway) is entitled "Ultrastructure, Encystment and Cyst Wall Composition of the Resting Cyst of the Peritrich Ciliate Opisthonecta henneguyi", by Calvo, et al., 2003. If you have access to the Journal of Eukaryotic Microbiology online, a quick glance at this article shows a wealth of information about the cyst walls in a number of ciliates. This list of species doesn't specifically include Tetrahymena, which explains why this paper is not in TGD Textpresso at the moment; however, it does state that chitin is a major component of the cyst wall in these other species.

So I now know chitin is found in ciliate cyst walls. My apologies to everyone who knew that already - when I first saw the annotation in the genome, I'd forgotten about that stage of the life cycle. I'm still left with a number of unanswered questions though: What conditions induce encystment in T. thermophila? Why are there 5 tandem chitin synthase genes in the Tetrahymena genome? Do they share a conserved promoter element? If they do, what other genes share this promoter element?

I hope this tutorial has shown you that tools to help answer all these questions can be found at TGD. Thank you for following along as I wandered around our site - I hope you have as much fun exploring here as I've had. Best of luck in all your studies. I'll be working on setting up new features for this site. And when I get a chance, I'll be looking into another gene I ran across, a homolog of something called "fizzy"...

N.A.S.
5/05



To contact TGD: Send email to ciliate-curator@genome.stanford.edu.
Return to TGD Home