SH Biotechnology 2007, bioinformatics part

Back to Johan Henriksson's page

Lectures

MA309 9:30-12:00 lecture notes
MA309 9:00-12:00 lecture notes

Lab instructions

Lab 1

Literature search

Opitz syndrome, what's this? Which are the related genes? which database should you search?

You want information about the p53 gene. As it is at the heart of cancer research, the community is rather productive. How many articles have been published just the past year? Which journals seem to be important within this field?

Gene ontology

Which is the GO-number corresponding to the osmosensory signaling pathway? Are there any other names for it? What is the definition? Where is it located in the GO-graph?

Genome browsers

Go to the UCSC genome browser. Bring up the map browser for mouse. zoom in on cromosome 6. Can you find anything interesting? Introns and exons? Enable the GC-content graph. Are there other useful tracks you can add? There is a suitable description of the tracks under "help" up-right.

Genome-wide search

The SRS has some nice tools. Albeit slow for common use, it has a handy tools for seaching the genomes for similar sequences. What is the following?
MAKRRGSVPGRVREYWLPSPCWKCHMLHQGKWWGRRSQGMGGAEGFMEHGSTTLQRKPGA
SSELGILQVRDLSWLVQPQAQTCCGSFVPLSAGLRASAK
Store it down as a FASTA file before you start working (you need to add one, the header (can be anything)). Also try BB Edit installed on these systems to store it in Windows-format (additional settings when you Save as).

Are there other similar sequences and how much differ they in E-value? Take one of the other sequences and make a dot plot with the available tools.

Simple operations

What is the weight of the protein above? ExPASy might have something that helps. Can you also go back to the DNA sequence? what would it look like if you took the reverse of it and then made protein again?

Pairwise alignment

Launch SRS. Make a dot plot using word match of uniprot|P08697 and uniprot|P28800, or two related proteins of your choice. What is the effect of word length?

Now do a local and a global alignment of these. Is there any difference in the result?

Trees & taxonomy

Phylogenetic trees can both be rooted and unrooted. Fill in the 4 missing cases in this file.

NCBI has a nice taxonomy resource; which is the most modern common ancestor family of c.elegans and house mouse?

Multiple alignment

Collect some histones with SRS (at least 4). Now you can either try a multiple alignment at SRS or you can try the Tcoffee link. Generate a phylogenetic tree.

TreeView and TreeExplorer can read the tree files produced by PHYLIP and MEGA. TreeView works best with PHYLIP and TreeExplorer with MEGA. Use any of these if you wish

Structure browsing

Go the the protein database. How many structures are there related to haemoglobin? Take one with a good resolution. Which PDB-viewers are available? Try them and see if there is any that you like. Download the PDB 4CHA and save it to disk.

Open the PDB in a text editor. Which seem to be the more import data fields and what do they do?

Download and run PyMol. Open your PDB-file. Try to display backbone, stick, spacefill, cartoon and surface. How many units are there in this PDB? Is this the native composition? remove all but one unit by double-clicking and then chain - remove.

How many beta sheets and alpha helices are there in the single unit? color each by a different color.

Measure the size of the protein: double-right click to create some suitable points. distance gives distance between them. look up the command with "help" if you are unsure how to use it.

The view can be improved by coloring residues and structures different ways. Try to make the image easier to see by playing around with these settings.

The active site consists mainly of the active triad, residues 57, 102 and 195. Assume you want to make a nice illustration of this to include in a report. Set up the right display modes for the rediues; for example you probably want the "stick" layout on these 3 residues, and their labels should be shown. On the rest you can use space fill to get an idea of what the pocket looks like. Set up the camera properly, store the view. Save the image in a png-file.

Looking further, what are the nearby residues? Enable the sequence view and highlight some additional residues you think could be involved.

Likewise, there is a hole called the oxyanion pocket which reduces the energy barrier by being properly charged. Does this show up in the surface view? Residues 193 & 195.

When looking at the backbone, the slab mode can be used. Play around with it and see if you can improve the display.

Protein classification

Take some of your favourite proteins and look them up on CATH and SCOP. What are the similarities and differences in the classification? For each protein, find a related protein. Open them in the PDB-program of your choice and get a closer look. Can you figure out the active sites?

(Extra) PSI-BLAST

Follow the PSI-BLAST tutorial.

(Extra) Special purpose database: BRENDA

Alcohol dehydrogenase breaks down ethanol. But is this the only reaction? Check it out on BRENDA.

(Extra) Special purpose database: Wormbase

Explore the cell divisions (lineage) in c.elegans. What kind of time information can you find?

Lab 2

Emboss & the console:

Using the console, create a new directory. Enter it and make 2 random .fasta-files using the command pico. Use the Emboss command needle to align these. analyze the output file.

Java: Alignment

1. Download this java program. Unpack it.

2. You can run the program by executing the script jaligner.sh. How does it work? Open the shell script in an editor and check how it is written.

3. Download two sequences of Histone from a suitable database and align them. How similar are they? Try different tables, open penalties and extension penalties. By changing table especially, you can get an idea how sensitive the alignment is to uncertainty about the evolutionary process. Some parts of the alignment will probably be more sensitive than others.

4. One way of speeding up the usage if you have many sequences is to use jaligner from the command line. Example:

java -jar jaligner.jar 1.fasta 2.fasta BLOSUM62 10 0.5
This will align the sequences in 1.fasta & 2.fasta. Create these FASTA-files and try this out.

5. The best way to interface jaligner for high volumes is to write a java program. Download this code into jaligner-1.0/src/. Open the file in a text editor, what does it do?

6. Compile the file. Remember that current directory must be jaligner-1.0/src/. Run it.

C: reverse complement

1. Download this C-program. It is an unfinished program, of course you are to fill in the rest :) Compile it and run it as ./nameofprogram AAATTTCCGG or ./nameofprogram 123456789

2. Make it complement C & G as well

3. Improve the output. Print the original sequence as 5'....'3 and output as 5'....'3. This will make it clear for the user which ordering is considered. Also make it print the unreversed complement as 3'....'5.

C: Alignment

(This exercise is borrowed from another course given at Chalmers)

1. Download this C-program and place it in a file that has the name alignment.c. Compile the program. Enter some short DNA-string and see what happens. What does the program do? The program can be stopped by CTRL-C.

Try out the program by entering different strings, and provoking gaps in both sequences.

2. Carefully examine the source code of the program and try to understand what is does in every step and how it works. This will not be so easy, but it is a very good exercise to try to do this. First just try to get a rough idea of what is going on in the different parts of the program, then successively try to understand more and more details. If something seems very difficult, skip it and come back to it later. Some of this work may have to be done also after the supervised session, but ask as much as possible while you can. Some explanations of C code syntax that might help:

3. Write a short description of the different parts of the code and what they do. Also explain some tricky details that you managed to understand.

(Extra) Emboss:

Follow the Emboss tutorial. This is an entire package for sequence operations such as alignment and also getting statistics. You might not be able to plot to screen; in that case redirect the plot to a file and show the file in preview.

Links

Further reading Databases Specific tools Programming Other

Haven't made it into the schedule, might not

prosite pfam sequence conversion. check: consite, REBASE, transfac missing: multiple seq alignment, DP, complexity, 3d fitting, biostats? restriction digest? finding ORF? read and understand a Perl script? sequencing data. alternative to 4peaks?