|
|
|
The amount of DNA sequence information is staggering. The genome sequence of hundreds of organisms is known, and the total amount of sequence information numbers in the hundreds of billions of nucleotides. All this information would be pretty useless unless we have excellent tools to search through and utilize this info. And we do.
The purpose of this “in-silico experiment” is for you to become familiar with some of the tools to utilize genomic and proteomic information. And really the only way to become familiar is by practice. For this reason, you will be mostly on your own on this (the TA will be there to explain and assist, but obviously not to answer the questions for you); you can explore for yourself, and see what all is available for interpretation of genomic information. “Help” menus are an awesome resource if you are using web pages and tools for the first time, so please make frequent use of them. If you don’t quite get done with the “experiments” during the lab period, just continue with them elsewhere; the web tools are accessible from any computer connected to the internet.
As with other experiments, there are Prelab questions that are due at the start of each lab. Moreover, as each week of Experiment III pretty much stands on its own, your report updates are due weekly for this experiment. In your report updates you must address all questions asked in the narrative of the Experiment of that week, and from what you include in your lab updates it must be clear that you went through the experiment point by point, and got the right results. This may be best done by having each sub-question/part of the experiment (a, b, c, etc.) be addressed in a separate paragraph in the Results. It will be most efficient if you write the skeleton of the Results updates while you are going through the experiment. You may want to bring your memory stick or equivalent to the lab in order to be able to copy the fruits of your labor on to it.
The Discussion section of your report serves to integrate what you learned in this experiment and to discuss general questions that are in the Experiment III narrative. In addition, there are discussion points at the end of each week’s narrative that you need to include in your weekly report update as well.
Prelab question:
What can BLAST be used for? What is the input (DNA or protein sequence, or both)? What is the output? What do e-values mean? If you need a refresher on this, go to http://www.ncbi.nlm.nih.gov/BLAST/ and look at the “help” pages.
The “experiment” for today:
a. Using BLAST (accessed via http://www.ncbi.nlm.nih.gov/) determine what a protein with the following sequence is, and what organism it is coming from:
MVTLLENPFRTGLRQERTPEPLILTIFGASGDLTQRKLVPAIYQMKRERRLPPELTVVGFARRDWSHDHFREQMRKGI
EEFSTGIGSEDLWNEFAQGLFYCSGNMDDPESYLKLKNFLGELDEKRNTRGNRVFYLAVSPNFFPPGIKQLGAAGMLS
DPVKSRIVIEKPFGRDLSSAQSLNRVVQSVCKENQVYRIDHYLGKETVQNLMVFRFANAIFEPLWNRQFVDHVQITVA
ETVGVEERAGYYESAGALRDMVQNHLMQLFCLTAMDPPNAIDADSIRNEKVKVLQATRLADINNLENAGIRGQYKAGW
MGGKPVPGYREEPGVDPSSTTPTFAALKLMVDNWRWQGVPFYLRTGKRMPKKVSEIAIQFRQVPLLIFQSVAHQANPN
VLSLRIQPNEGISLRFEAKMPGSELRTRTVDMDFSYGSSFGVAAADAYHRLLLDCMLGDQTLFTRADEVEEAWRVVTP
VLSAWDAPSDPLSMPLYEAGTWEPAEAEWLINKDGRRWRRL
b. You hopefully figured out successfully what enzyme this protein is, and in what organism it occurs. The next question is where in the metabolic pathway this enzyme is functioning. A very useful website with information on which enzymes do what is KEGG, the Kyoto Encyclopedia of Genes and Genomes (http://www.genome.ad.jp/kegg/). KEGG Overview and KEGG Databases (see links on left margin) provide you with an overview of what’s all on the site. What do you think is a good way to find out in which KEGG Pathway Module the enzyme in a. is located?
c. Based on your research in b., which pathway(s) in the KEGG Pathway database (http://www.genome.ad.jp/kegg/pathway.html) is/are the one(s) that include(s) your enzyme? Click on the corresponding pathway in Section 1.1 (Carbohydrate Metabolism). What do you find? Is there more than one way from D-glucose to 6-phospho-D-gluconate (remember that the first step of glycolysis is conversion of glucose to glucose-6-phosphate)? And what do you think those numbers (e.g., 1.1.1.49) mean?
d. What you need to know is what pathways may actually exist in the particular organism that your enzyme came from. Near the top of the screen, select the organism you found in a. that your protein was coming from. You will see some color changes on your screen. What do the green boxes mean, you think? Can you determine from this screen whether there is a potentially full pathway to go from glucose-6-phosphate to glyceraldehyde-3-phosphate?
e. You hopefully remember from earlier classes that glucose-6-phosphate and glyceraldehyde-3-phosphate are in the glycolysis pathway. Can you figure out whether in this organism there are genes coding for all required steps of glycolysis as well?
f. Now look at glycolysis in Geobacter sulfurreducens, a common soil bacterium. Does this organism have all the genes required to do glycolysis? If not, how do you think it survives?
g. The genome sequence provides a wonderfully predictive way of what reactions the organism may be able to perform. Discuss what are some of the caveats in the predictions: do you think it is possible that an organism cannot perform a certain function even though it has the gene for it? Conversely, would it be possible that an organism can catalyze a certain enzymatic conversion even though it does not seem to have the appropriate genes?
In your weekly update (and therefore also in your lab report), not only address all points and questions (a – g) above, but also provide a narrative that integrates the questions. The Discussion section of your report is particularly suitable for this purpose. Examples of materials to include in your Discussion section of your weekly update this week:
1. Discuss the significance of using the alignment properties of the BLAST algorithm.
2. Point out and discuss the major differences shown between the glycolysis/gluconeogenesis pathways for Synechocystis sp PCC6803 and Geobacter sulfurreducens.
Prelab questions:
Continuing with what we did in Experiment III-9, this week we will look some
more at metabolic pathways and their presence in specific types of organisms.
1. Does the presence of a gene in an organism mean that the metabolic reaction step catalyzed by the corresponding enzyme is efficient? Explain.
2. Does the presence of a metabolic reaction step that requires a specific enzyme in an organism imply that the gene for such an enzyme is actually present in the organism? Explain.
3. In prokaryotes the operon arrangement of genes often has functional significance. Describe briefly in your own words what that functional significance is: how would you be able to predict the function of one of the gene products in the operon if you knew the function of one of the other gene products in the operon?
The “experiment” for today:
First we will continue with the information available on KEGG. Last week
you saw that there was a real difference between organisms in how they do even
the most basic reactions such as conversion of glucose to pyruvate. You
probably have gotten the impression from your textbooks that glycolysis is the
only game in town to do this, but many prokaryotes do just fine without it, and
use other pathways. It would be good to be able to find out which organisms
do what, and compare. So we will explore this in some more detail below.
a. Go to http://www.genome.jp/kegg/metabolism.html,
and click “glycolysis/gluconeogenesis”. This will lead you to the
glycolysis pathway and a bunch of related pathways, similar to what you saw last
week. Now click on “Ortholog table” at the top of the screen (orthologs
are related proteins of similar function, but from different organisms). At
the top of the table you will see the enzyme numbers and the enzyme names, and
the first column represents different organisms. You can click on “Select” and
enter specific three-letter codes to see which organisms the codes correspond
to. Do all organisms have a gene for phosphofructokinase (a key enzyme
for glycolysis)? If not, select five organisms that do not have a phosphofructokinase
gene; you will be finding out in a subsequent question in this experiment what
they use instead.
b. First a little more about this ortholog table. By clicking on a link in a table matrix you will get information on the sequence of the gene and protein of a specific function in a specific organism. In the left column of the table, clicking on “P” behind an organism code will lead to a color-coded chart of metabolic steps that the organism appears to have according to its genome (see last week’s experiment). “G” will give you the location of the various genes in this chart on the genome, and “T” gives information regarding the proteins and genes in this table for the selected organism (including their sequence in FASTA format). Get 10 phosphofructokinase protein sequences from ten prokaryotes in the table in FASTA format, also get the ones from Homo sapiens, Xenopus laevis, Caenorhabditis elegans, Drosophila melanogaster, and Strongylocentrotus purpuratus (sea urchin). Align the 15 different sequences using the ClustalW program at http://www.ebi.ac.uk/Tools/clustalw/ or http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html. Which ones are closest to each other? Do you think this makes sense? Do you think that the very different N-termini for the eukaryotes (and perhaps also the prokaryotes) are real?
c. Now go back to the five organisms you selected that did not have phosphofructokinase. Find out what they do instead (if anything) to go from glucose-6-phosphate to pyruvate. By now you should know what to “click” in order to get the information needed.
However, KEGG is not “the only game in town” to provide this type of information. Another useful site is BioCyc, with the related EcoCyc and MetaCyc toolkits. EcoCyc (http://ecocyc.org) essentially is an E. coli "encyclopedia"; E. coli has been chosen for this as there is much information that has accumulated regarding this model organism over the past decades, but similar pages (perhaps less detailed) exist for other organisms. Look over the EcoCyc overview on http://ecocyc.org/background.shtml.
d. On the Project Overview site, click on "metabolic pathways" and then find on "glycolysis" and "glycolysis I" (quite a long list of stuff, eh?). The link will lead you to the reaction steps, and mousing over the steps you will find more detail regarding the reactions, enzyme names, activators and inhibitors, etc. More detail than on the KEGG site.
e. Scroll down to "Locations of Mapped Genes": What do you think the circle represents? What happens if you click on the little purple lines on the circle? What do you think those purple lines are?
f. Click on all purple lines on the circle, and for each of them note the chromosomal location and the function of the corresponding gene. Do you find any operons? Reconstruct the glycolysis reactions with the enzymes coded for by the different genes. Why do you think some steps require more than one protein? What do you think is the numerical code under “gene-reaction schematic”?
g. On the “glycolysis I” chart, go to the “Genetic
Regulation Schematic”. Explain in your own words what this chart
is telling you.
If you are interested in a specific metabolic pathway, we have seen that in KEGG
you can find out in which organisms genes for particular enzymes in this pathway
are found. We will now look at another site (MetaCyc, related to EcoCyc)
that provides complementary information.
h. Go to MetaCyc (http://metacyc.org). Click on “Database Search”. On the resulting BioCyc Query Page, select “MetaCyc” as dataset, and then click on “Choose from a list of all pathways”. The list you now get is a lot longer than the list of metabolic pathways in EcoCyc. Why? Find “glycolysis” again. You will see that there is a link to “glycolysis I”, “glycolysis II”, “glycolysis III”, “glycolysis IV”, and “glycolysis V”. What are the differences between the various ones? Do they occur in the same organism?
i. On the glycolysis I page in MetaCyc, click on the arrow leading from fructose-6-phosphate to fructose-1,6-bisphosphate. You will find a list of about eleven phosphofructokinases. Click on each. Looking at the info on these pages, discuss the properties of the various types of phosphofructokinases found in nature.
j. On the same page as where your eleven phosphofructokinases were listed, click on “cross-species comparison” at the top of the page. Then select all species. Do all species have a phosphofructokinase gene according to this list? Specifically look for Synechocystis sp. PCC 6803, and note what is listed there.
k. Now go to CyanoBase, click on Synechocystis sp. PCC 6803, and search for phosphofructokinase. What do you find? Are they “real”? (How would you find out? Hint: do a sequence alignment). The moral of the story?
In your weekly update (and therefore also in your lab report), not only address all points and questions (a – k) above, but also provide a narrative that integrates the questions. The Discussion section of your report is particularly suitable for this purpose. An example of materials to include in your Discussion section of your weekly update this week:
Prelab questions:
Today we will explore what information can be gleaned from comparing complete genome sequences. Just to get into the spirit, please visit listings of prokaryotic strains and species with sequenced genomes at JGI (Joint Genome Institute) (http://genome.jgi-psf.org/mic_home.html) and TIGR (The Institute for Genomic Research; now part of the J.Craig Venter Institutes) (http://cmr.tigr.org/tigr-scripts/CMR/shared/Genomes.cgi?crumbs=genomes); also look at some of the eukaryotes with a sequenced genome on http://genome.jgi-psf.org/euk_cur1.html and on the sub-pages of http://www.tigr.org/db.shtml.
1. About how many sequenced prokaryotes did you find? And how many sequenced eukaryotes? Why are these numbers so different, you think?
2. About how many genes total do you think the sequenced prokaryotes represent? And how many genes the eukaryotes? Explain how you got to your numbers.
3. Go to http://www.nature.com/nature/journal/v437/n7055/full/nature04072.html (there’s also a copy of this chimpanzee sequence paper on Blackboard). In the first part of this paper, what can you glean regarding the similarity between humans and chimpanzees?
The “experiment” for today:
The Integrated Microbial Genomes resource on the JGI (Joint Genome Institute) website is at http://genome.jgi-psf.org/mic_home.html and is quite a useful site.
a. Go to the Genome Browser at http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=FindGenomes&page=findGenomes, with the aim of comparing genomes from Escherichia coli K12 and Escherichia coli O157:H7 EDL 933. At IMG Home, which tab on top do you think you need to click for this purpose? When you do that, what do you see? Is it just comparing the ones you want? Hint: what about “find genomes” first? Now, try again...
b. What are the “genome statistics” on the comparison page telling you? What do they represent? Just the statistics of one of the two E. coli strains? If you compare genomes from two E. coli strains, how similar do you expect them to be? Why is that?
c. Let’s see whether reality agrees with your intuition. On the bottom of the Genome Statistics page, go to “Breakdown by selected genomes, general statistics” and “Breakdown by selected genomes and COG function categories”. By modifying the information requested on these pages, see how similar these two Escherichia coli strains actually are. Include information on the number of genes, genome length, % of the genome that is codons, % of genes with a functional prediction, % of genes coding for enzymes.
d. Now go to “VISTA” (can be reached from the Genome Statistics page) for a more visual comparison of the two sequences. When you click on one of the strain names, what do you see? What do you think the “15 kb” on the top of the page means? How much do you actually want to compare in one view? How do you think you can get there? (Hint: try a right-click on the mouse when you see a horizontal double arrow on the bar that indicates the alignment length; the re-alignment will take a while as there’s a ton of data of adjust.) With any of these web-based tools, there’s a bit of trial and error when using the tools for the first time. How similar are the two genomes to each other over as much of the genome as you can find? Is the similarity pretty much identical over the entire genome? What are your thoughts on this?
e. To get a feel for how similar unrelated prokaryotes may be to each other, compare two prokaryotes from different genera with each other in the same way you have compared the two Escherichia coli strains. Discuss your results in your weekly update and report.
f. You will see that eukaryotes are absent from
the VISTA pages. One might think that that’s simply because we are
looking at the Integrated Microbial Genomes page here, but there’s more
behind it: What do you think would happen if you were going to compare
the genomes of a human and a chimpanzee and a mouse using a tool like VISTA? Where
do you think the primary areas of similarity are going to be? There is
a bit on this in the chimpanzee sequence paper listed above, and some more on
this in one of the mouse genome papers (on Blackboard; Waterston et al. (2002)
Initial sequencing and comparative analysis of the mouse genome
Nature 420, 520-562). Discuss this question of prokaryotic vs.
eukaryotic alignments in the Discussion section of your weekly lab update and
report.
In your weekly update (and therefore also in your lab report), not only address all points and questions (a – f) above, but also provide a narrative that integrates the questions. The Discussion section of your report is particularly suitable for this purpose. Examples of materials to include in your Discussion section of your weekly update this week:
1. Keeping in mind the results you obtained from the comparison of the two E. coli strains vs. the two unrelated organisms you chose, what can you say about prokaryotic “species”? How do you think strains can obtain large amounts of new DNA, and what would be evolutionary pressures to retain the new DNA? What does this tell you about how gradual or non-gradual evolution can be?
2. Explain possible reasons why VISTA is a tool solely designed for the comparison of prokaryotic genomes.
Prelab question:
In most organisms, at least a quarter or a third of the open reading frames in genome sequences do not code for known proteins. It is important to try to predict whether one of those open reading frames codes for a “real” protein of as yet unknown function, or the open reading frame most likely is just a “fluke”. What are three experimental or in silico criteria (you may name more if you like) that could help you determine whether a specific open reading frame found in the genome sequence of an organism most likely is coding for a real protein?
The “experiment” for today:
Via the NCBI website you used before, get the protein sequence of Slr0408, annotated as a hypothetical protein from Synechocystis. We will be trying to find out some of its properties and what it is related to, so that we may get some clues about its function.
a. How many amino acid residues does the protein have? What, therefore, is its approximate size (in kDa)? Relative to other proteins you know of, is it large or small? What is the chance of a random sequence of the corresponding DNA being an open reading frame of this length (in other words, what is the chance of having a string of codons of this length that does not contain a stop codon)?
b. Do a BLAST on this protein to see whether there are proteins in Synechocystis or in other organisms that look like it, and that maybe have an assigned function. BLAST on this protein is going to take a while (discuss why?), so continue with the next sections while you keep BLAST running in the background. Copy and paste the BLAST results in a file that you save on your memory stick (or into a new document that you email to yourself) to use in your report. Make sure to discuss the results: do other proteins align along the entire length of Slr0408? What do you think that may mean? What is the organism from which the two best hits come? What do you think that implies? What are some of the assigned functions (protein names) for the best 50 hits? Can you give a quick summary of what those proteins are or do?
c. Visit Expasy (http://us.expasy.org/). This site has a lot of useful bioinformatic protein analysis tools. Focus on “Tools and software packages”, and click on “Proteomics and sequence analysis tools”. First, we want to know whether this is a membrane protein or a soluble protein (why would this be useful information?). To get some information that may help to answer this question, scroll down to the “Topology prediction section”, click on “TMPred”, paste in the Slr0408 sequence, and run the program. What does it tell you? Again, good to download the results on your memory stick, or email them to yourself, so you can interpret and discuss them in your weekly update and report.
d. Another useful piece of info may be whether this protein has repeated domains. This can be done using the REPRO tool that is under “Primary structure analysis” on Expasy. There is a lot of info to sort through for a computer, and this by necessity leads to a slow processing time. Note the address of the web page where your results will appear, and access this information later at home. In any case, what do you think is striking in the results of the analysis, and how would you interpret the data? What do you think is the statistical probability that two amino acids in an aligned sequence are identical? How do gaps in the sequence alignment influence this probability?
e. Yet another way to get potentially useful information is a search for functionally relevant motifs in the primary structure of the protein. This is done by InterPro Scan, again accessible from the link on the Expasy website. In this case, it will provide information that may be hard for you all to fathom what it means exactly, but it does not hurt to just paste in your sequence, and see what InterPro Scan comes up with. As time permits and if you are interested, please read the Help files.
f. PROSITE Scan (again accessible via a link on Expasy) is another good one. Have a look at all the motifs it recognizes. Note that the presence of any motifs (glycosylation, etc.) does not mean that such a post-translational modification must occur, but it gives you something to consider. Also note that many of the modification motifs have quite a bit of degeneracy (how can you see that in your results?), so for a protein the size of Slr0408 there is some chance that hits are random. For this reason, there are “randomized probabilities” listed in the PROSITE Scan results: Interpret your data with these probabilities in mind.
g. It will be important to know whether the protein is ever found “in real life”. There is no published evidence yet that Slr0408 has been found in proteomic studies (where proteins present in an extract or preparation are identified). However, this is not surprising as large proteins are easy to miss in gel-based approaches (why?). An alternative approach is to see whether transcripts of the gene are found. In some microarray studies, transcripts for this gene have been detected, and therefore it is likely to be “real”. How does this relate to your discussion re. point a.?
When you integrate this information in your weekly update and report, try to think what this protein may do. You most likely will not come up with a detailed hypothesis necessarily (after all, the gene is annotated as coding for an “unknown protein”), but you should think about and comment on what are some of the things you can find out about proteins from their sequence, and how some of the tools that are available to you can give you ideas about possible function.
In your weekly update (and therefore also in your lab report), not only address all points and questions (a – g) above, but also provide a narrative that integrates the questions. The Discussion section of your report is particularly suitable for this purpose. An example of materials to include in your Discussion section of your weekly update this week:
An issue that comes up frequently in proteomics is how one identifies an isolated protein (for example, excised after 2D gel electrophoresis). A convenient method is to look at the mass of a protein. Mass spectrometry is very precise and requires only small quantities of material. However, knowing the mass of a complete protein isolated from an organism is not necessarily going to give you an unambiguous identification of the protein because often the mass does not match anything that is in the database of predicted protein masses (calculated from adding up the masses of the residues according to the predicted sequence, and taking into account natural isotope abundance).
Prelab question:
What are some of the reasons (try to think of three different ones) why the experimentally determined mass (determined by mass spectrometry on the isolated protein) may differ from the mass calculated from adding up the masses of the residues according to the predicted sequence, and taking into account natural isotope abundance?
The “experiment” for today:
To counter the issues you hopefully identified in the prelab question, it is usually good to do a trypsin digestion of the protein (or of a protein mixture with limited complexity, i.e., a mixture with not too many different proteins), and then to determine (using a MALDI-TOF mass spectrometer) the masses of tryptic fragments of your protein(s) (tryptic fragments means that these are fragments that resulted after treatment of the protein with trypsin). Once you have these data, you can figure out what are good matches.
Assume you have isolated a protein or a protein complex from blue-green-looking bacteria that may or may not be an axenic (pure) culture. Listed below are the 343 monoisotopic sizes (m/z) of the tryptic fragments that you have found upon MALDI-TOF analysis. You know that trypsin may not necessarily cut at all sites all the time, so it is possible that it misses cleavage once in a while. Questions to you:
a. Using MS-Fit that is part of ProteinProspector (see the Expasy website for the link), find the protein(s) that matches best, and interpret in your own words what the web tool is telling you.
b. The protein complex came from an environmental sample with lots of different “strains” from a microbial mat. Can you determine which strain this sample most likely came from?
c. To get an idea how many mass fragments actually are needed for an unambiguous identification of the protein complex, take 10, 50 and 175 masses from the list below (randomly selected), and redo your MS-Fit search. What do you find? Interpretation?
d. You may not have all masses in your MALDI-TOF that you expected for the protein, and you may have other masses in your MALDI-TOF results that were not predicted bioinformatically. Where would these discrepancies come from, and are such discrepancies a major problem? Explain.
In your weekly update (and therefore also in your lab report), not only address all points and questions (a – d) above, but also provide a narrative that integrates the questions. The Discussion section of your report is particularly suitable for this purpose. An example of materials to include in your Discussion section of your weekly update this week:
1. Explain the rationale behind doing a trypsin digestion of a protein to be identified, and why this approach is most commonly used to fragment target proteins.
2. What kind of data are inputs and outputs when using ProteinProspector tools? How would one get the input data, and what can the output data tell us?
Monoisotopic m/z values:
347.1925 |
359.2401 |
361.183 |
374.2034 |
393.2245 |
418.2045 |
423.2602 |
446.247 |
450.2195 |
460.2514 |
460.2514 |
521.3194 |
521.3194 |
536.3191 |
549.3144 |
571.3198 |
582.3068 |
598.3017 |
604.3049 |
606.3206 |
645.3566 |
658.3883 |
676.3988 |
718.3883 |
739.341 |
744.4614 |
747.3995 |
747.4182 |
762.3781 |
763.4131 |
763.4825 |
764.3937 |
770.362 |
779.4046 |
801.4577 |
832.4999 |
833.4304 |
846.5043 |
874.4894 |
875.4945 |
889.4738 |
922.4604 |
928.4734 |
935.5343 |
938.4553 |
951.5292 |
964.4523 |
967.536 |
978.5037 |
981.4789 |
987.5218 |
991.532 |
994.4986 |
1030.542 |
1059.562 |
1075.557 |
1155.591 |
1169.559 |
1197.659 |
1205.627 |
1206.659 |
1210.69 |
1227.6 |
1243.679 |
1279.633 |
1295.628 |
1296.659 |
1299.72 |
1302.626 |
1311.622 |
1311.692 |
1312.654 |
1315.715 |
1318.621 |
1325.66 |
1328.649 |
1393.679 |
1407.635 |
1423.63 |
1424.776 |
1444.711 |
1454.796 |
1459.733 |
1480.71 |
1496.705 |
1544.79 |
1547.686 |
1547.69 |
1549.78 |
1651.847 |
1664.782 |
1667.842 |
1700.891 |
1718.88 |
1807.948 |
1823.943 |
1848.868 |
1853.983 |
1864.863 |
1876.992 |
1889.882 |
1945.981 |
1975.007 |
1987.163 |
2002.912 |
2018.085 |
2018.907 |
2021.072 |
2034.08 |
2043.999 |
2044.928 |
2052.923 |
2059.994 |
2060.923 |
2075.989 |
2110.102 |
2126.096 |
2188.009 |
2218.087 |
2229.076 |
2232.054 |
2245.071 |
2248.049 |
2253.107 |
2286.093 |
2296.045 |
2302.088 |
2318.083 |
2352.042 |
2357.062 |
2374.189 |
2390.225 |
2442.194 |
2458.189 |
2472.198 |
2474.184 |
2480.137 |
2488.193 |
2527.287 |
2543.282 |
2574.295 |
2588.234 |
2604.229 |
2651.271 |
2667.266 |
2715.509 |
2731.504 |
2756.273 |
2759.372 |
2769.457 |
2770.409 |
2775.367 |
2793.275 |
2833.516 |
2837.425 |
2849.511 |
2958.631 |
2974.626 |
2981.489 |
2985.464 |
2996.595 |
3001.459 |
3006.457 |
3017.454 |
3022.452 |
3037.443 |
3165.538 |
3227.518 |
3278.659 |
3297.611 |
3303.794 |
3313.606 |
3316.547 |
3319.789 |
3332.542 |
3335.784 |
3354.727 |
3366.642 |
3382.637 |
3459.895 |
3468.644 |
3475.89 |
3491.885 |
3522.743 |
3538.738 |
3570.7 |
3586.695 |
3593.903 |
3609.898 |
3628.724 |
3643.78 |
3691.815 |
3707.81 |
3716.678 |
3723.805 |
3857.903 |
3873.898 |
3971.056 |
3987.051 |
4081.195 |
4097.189 |
4286.16 |
4302.155 |
4318.15 |
4334.145 |
4437.23 |
4453.225 |
4469.22 |
4512.376 |
4528.371 |
4544.366 |
4699.403 |
4715.398 |
4731.392 |
4777.452 |
4791.364 |
4793.447 |
4807.359 |
4809.442 |
4823.354 |
4996.642 |
5012.636 |
5102.632 |
5118.627 |
5157.553 |
5173.548 |
5189.543 |
5258.734 |
5274.729 |
5473.801 |
5489.796 |
5505.791 |
5521.786 |
5563.801 |
5579.796 |
5595.791 |
5611.786 |
5627.781 |
5643.776 |
5740.002 |
5755.997 |
5757.028 |
5773.023 |
5892.876 |
5908.871 |
5924.866 |
6016.025 |
6032.02 |
6048.015 |
6117.081 |
6133.076 |
6232.079 |
6248.074 |
6264.069 |
6393.103 |
6409.098 |
6419.238 |
6425.093 |
6435.233 |
6545.286 |
6549.204 |
6565.199 |
6581.194 |
6792.289 |
6808.284 |
6824.279 |
7070.777 |
7086.772 |
7216.621 |
7232.616 |
7556.81 |
7805.901 |
7821.896 |
7822.928 |
7837.891 |
7838.923 |
7854.917 |
7918.787 |
7934.782 |
7950.776 |
7966.771 |
8179.082 |
8195.077 |
8343.166 |
8359.161 |
8614.294 |
8630.289 |
9237.997 |
9239.731 |
9253.992 |
9255.726 |
9269.987 |
9271.721 |
9486.755 |
9502.75 |
9518.745 |
9534.74 |
9550.735 |
9566.73 |
9582.725 |
9643.974 |
9659.969 |
9675.964 |
9963.44 |
9979.435 |
9995.43 |
10251.25 |
10267.25 |
10283.24 |
10432.19 |
10448.18 |
10449.22 |
10464.18 |
10465.21 |
10480.17 |
10481.21 |
10496.17 |
10497.2 |
10512.16 |
10513.2 |
10528.16 |
10529.19 |
10545.19 |
10870.5 |
10886.5 |
10902.49 |
10918.48 |
11015.42 |
11031.42 |
11047.41 |
11063.41 |
11079.4 |
11095.4 |
11111.39 |
11157.66 |
11173.66 |
11189.65 |
11271.9 |
11278.8 |
11287.9 |
11294.79 |
11303.89 |
11310.79 |
We have explored some useful websites regarding genomes and protein analysis, but just as important is understanding which protein in a cell may interact with which other protein(s), as this provides a next level of insight into regulation of proteins, protein complex formation, protein assembly, etc. However, protein interactions are sometimes hard to detect experimentally (some interactions are loose or temporary), and there is often a host of literature on the subject without much of a consensus. To help getting an overview on what protein-protein interactions are feasible and which ones are likely and potentially functionally relevant, today we will be exploring the web tool STRING (http://string.embl.de/) that “predicts” protein-protein interactions for a particular protein from an organism with a fully sequenced genome. The predictions are often less-than-correct, but usually one can find out what the basis for the predictions was, and in some cases the predictions can provide ideas on what a protein may be doing or what it may be interacting with. So, knowing its limitations, it’s actually quite a useful program as long as the results are viewed as in silico predictions rather than experimental truth.
Prelab question (or experiment, actually):
Write down three proteins from sequenced species that are of interest to you and that you hopefully know a little bit about, and prioritize them. Each student will work on a different protein. Then go to string.embl.de, type in the protein and choose the organism, choose “proteins” for interactors wanted, and verify that entries indeed are available for your top three choices. If not, adjust your choices and repeat the procedure. Make sure to show up at the lab with three choices that you are prepared to go into more detail for and possibly read a bit about.
The “experiment” for today:
First, the group will compile which protein(s) each of you would like to query, and compare it to lists of earlier lab sections, and the TA will assign to each of you your “own” protein and organism based on your choices and on what other students already did or are doing.
Now, go to STRING, type in your protein and organism (again, “proteins” for interactors) to see which 20 proteins (not the default setting; can be changed once you did the initial search) your protein is most likely interacting with. You will see an interaction matrix (neighborhood, gene fusion, co-occurrence, etc.) with dots (black if clear, grey if in grey area) representing a “hit” for that. You can get more information on these hits by clicking on the corresponding rectangle on the “Views” line.
Evaluate the information that you obtain (make sure you visit most or all of the “Views” as well as the “Summary Network”), and discuss in your weekly update and report at least the following aspects:
1. What is the nature of the interaction of your protein with at least three of the proteins on your list (do they form a complex, etc.)?
2. Are there any functional interactions that you expected and that did not show up, or are there putative interactions that surprised you?
3. What is the summary network telling you?
4. Explain what type of information is provided in all the rectangles on the Views line.
Happy thinking!
|
Center for Bioenergy & Photosynthesis Arizona State University Box 871604 Room PSD 209 Tempe, AZ 85287-1604
28 August 2007 |
phone: (480) 965-1963 fax: (480) 965-2747 |