Learning Exercise

Intron/Exon Splice Sites

Genome projects generate tremendous amounts of genomic DNA sequence every year. Scientists wish to identify individual genes hidden within this genomic sequence. Once identifying a gene they would then like to piece together the exons to obtain just the coding sequence. This allows one to predict the amino acid sequence and possibly infer a function for the gene. In this assignment students will use software that recognizes intron/exon splice sites and other markers of a gene to identify genes in genomic DNA sequences.
Course: Molecular Biology, Human Genetics, Bioinformatics
Share

This site contains materials and links that are designed for undergraduate level molecular biology and bioinformatics... see more

Exercise


 

When sequencing genomic DNA it is often difficult to identify a gene,
or to determine which regions of a gene would encode for protein. 
With the many genome projects underway this is a common issue, and researchers 
use computer programs to look for several components of a gene.  These
can include long open reading frames,  intron/exon splice sites, poly
adenylation signals and CpG islands.

The program GENSCAN
allows you to look for intron/exon splice sites, promoters and polyadenylation
sequences.  It will also link the various exons together to produce
a long translated protein that you can use to search databases.

EXERCISE

There are three parts to t his exercise.

1.  Use the program GENSCAN to identify a possible coding region
within a genomic sequence

2.  Use the program BLASTP to see how well your prediction matches
with the actual coding region

3.  Use the site UniGene to examine the location of this gene
on a chromosmal map.

The following four accession numbers identify genes that are relatively
small (<10,000 bp) and contain just a few exons.  You could use
other gene sequences or random segments of genomic DNA sequence as well.

                         
J00265 Insulin

                         
J00120 Myc

                         
J00148 Human growth hormone

                         
V00499 Beta globin

PART 1.

Use Entrez 
to obtain the sequences of these genes.

Select FASTA under Display to get just the DNA sequence.

Copy the sequence and open GENSCAN.

Paste your genomic sequence into the box and select Run GENSCAN.

You will get results which predict the location of intron and exon splice
sites and the predicted coding sequence.

    How many exons were pres ent in the gene you selected?

    How many introns?

    What was the length of the genomic sequence you
submitted?

    What was the length of the regions that encoded
protein?

    What other regions were identified by the program
(poly adenylation sites, promoters, repeated elements, CpG islands)?

    What are the functions of these other regions?

 

PART 2.

Once you think you have identified the coding sequence of the gene,
copy that amino acid sequence.

Use the predicted amino acid sequence to perform a BLASTP
alignment with other proteins to see how well your prediction matches with
the correct cDNA sequence. Instructions on the use of BLAST
are provided.

In the BLAST program paste the amino acid sequence into the box
and select blastp under program and select search.

You will get back an alignment of known sequences.   Compare
these amino acid sequences with the sequence you predicted.

    Did the sequence you submitted match with the predicted
protein?

    Did the entire sequence match, orwere there regions
that did not match?

    Explain any unexpected results.

    If your prediction was not accurate go back to GENSCAN
to see if you can figure out where the error occurred.

 

 

PART 3.

Go to the site UniGene
to learn more about your gene.

Paste the acession number into the search box a nd hit GO.

    Has a homologous gene been identified in other organisms?

    If so what percent homology does the human protein
have with each of the other species indicated?

    Which chromosome your gene is located on?

    Does it contain any STS?  If so which ones

    Which organs express this gene (EST)?

 

Audience

Technical Notes

Students will use a program which searches large genomic DNA sequences for specific patterns.

Requirements

Knowledge of the organization of eukaryotic chromosomes and genes. Processing of eukaryotic mRNA.

Topics

Molecular Biology, Bioinformatics, Genomics, Introns, Exons, CpG islands, promoters

Learning Objectives

To be able to find and translate a gene from a genomic DNA sequence.