1. INTRODUCTION TO PROTEOMICS
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with genomics, the study of the genes. The word "proteome" is a blend of "protein" and "genome", and was coined by Prof Marc Wilkins in 1994 while working on the concept as a PhD student.
Proteome: All the proteins that can be synthesized by the cell.
The proteome is the entire complement of proteins, including the modifications made to a particular set of proteins, produced by an organism or system. This will vary with time and distinct requirements, or stresses, that a cell or organism undergoes.
A proteome is quite a bit more complicated than the genome because a single gene can give rise to a number of different proteins through
• alternative splicing of the pre-messenger RNAs;
• RNA editing of the pre-messenger RNAs;
• attachment of carbohydrate residues to form glycoprotein;
• addition of phosphate groups to some of the amino acids in the protein
While we humans may turn out to have only 25 to 30 thousand genes, we probably make at least 10 times that number of different proteins. More than 50% of our genes produce pre-mRNAs that are alternatively-spliced.
2. INTRODUCTION TO GENOMICS
Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.
For the United States Environmental Protection Agency, "the term "genomics" encompasses a broader scope of scientific inquiry associated technologies than when genomics was initially considered. A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels.
3. Why study Proteomics?
The study of proteomics is important because proteins are responsible for both the structure and the functions of all living things. Genes are simply the instructions for making proteins. It is proteins that make life.
The key requirement in understanding protein function is to learn to correlate the vast array of potential protein modifications to particular phenotypic settings, and then determine if a particular post-translational modification is required for a function to occur.
Even if one is studying a particular cell type, that cell may make different sets of proteins at different times, or under different conditions. Furthermore, as mentioned, any one protein can undergo a wide range of post-translational modifications.
Therefore a "proteomics" study can become quite complex very quickly, even if the object of the study is very restricted. In more ambitious settings, such as when a biomarker for a tumor is sought - when the proteomics scientist is obliged to study sera samples from multiple cancer patients - the amount of complexity that must be dealt with is as great as in any modern biological project.
3.1 Limitations to genomic study
Scientists are very interested in proteomics because it gives a much better understanding of an organism than genomics. First, the level of transcription of a gene gives only a rough estimate of its level of expression into a protein. An mRNA produced in abundance may be degraded rapidly or translated inefficiently, resulting in a small amount of protein. Second, as mentioned above many proteins experience post-translational modifications that profoundly affect their activities; for example some proteins are not active until they become phosphorylated. Methods such as phosphoproteomics and glycoproteomics are used to study post-translational modifications. Third, many transcripts give rise to more than one protein, through alternative splicing or alternative post-translational modifications. Fourth, many proteins form complexes with other proteins or RNA molecules, and only function in the presence of these other molecules. Finally, protein degradation rate plays an important role in protein content.
4. Methods of studying proteins
4.1 Determining proteins which are post-translationally modified
One way in which a particular protein can be studied is to develop an antibody which is specific to that modification. For example, there are antibodies which only recognize certain proteins when they are tyrosine-phosphorylated; also, there are antibodies specific to other modifications. These can be used to determine the set of proteins that have undergone the modification of interest.
For sugar modifications, such as glycosylation of proteins, certain lectins have been discovered which bind sugars. These too can be used.
A more common way to determine post-translational modification of interest is to subject a complex mixture of proteins to electrophoresis in "two-dimensions", which simply means that the proteins are electrophoresed first in one direction, and then in another... this allows small differences in a protein to be visualized by separating a modified protein from its unmodified form. This methodology is known as "two-dimensional gel electrophoresis".
Recently, another approach has been developed called PROTOMAP which combines SDS-PAGE with shotgun proteomics to enable detection of changes in gel-migration such as those caused by proteolysis or post translational modification.
4.2 Determining the existence of proteins in complex mixtures
Classically, antibodies to particular proteins or to their modified forms have been used in biochemistry and cell biology studies. These are among the most common tools used by practicing biologists today.
For more quantitative determinations of protein amounts, techniques such as ELISAs can be used.
For proteomic study, more recent techniques such as Matrix-assisted laser desorption/ionization have been employed for rapid determination of proteins in particular mixtures.
4.3 How To Study?(A Procedure)
1. Isolate a homogeneous population of cells (e.g., yeast cells that have just been switched from glucose to galactose as their energy source).
2. Extract the contents of the cells and separate the mix of proteins from other components.
3. Separate the proteins in the mix by two-dimensional (2D) gel electrophoresis. This separates the proteins
o in one dimension by their electrical charge;
o in the second dimension by their size.
(The procedure is analogous to that used in paper chromatography.
4. Stain the gel to visualize the various spots of protein.
5. Punch out a spot.
6. Add a protease (e.g., trypsin) to digest the protein in that spot into a mix of peptides.
7. Run the mix through a mass spectrometer, which will separate the peptides into sharply-defined peaks.
8. Run the resulting data through a database of all known proteins (that have been digested with the same enzyme) to see if you can find a match.
What if there is no match; that is, you have stumbled on an unknown protein?
1. Isolate individual peptides from your mix and run one through a mass spectrometer that has been modified to
o first randomly break the peptide into a mix of fragments containing one, two, etc. amino acids
o then measure the mass of each fragment.
2. Enter the resulting data into a database that matches the mass data with known pairs, triplets, etc. of amino acids.
3. With the aid of overlaps, assemble the fragments to reveal the entire sequence of the peptide.
4. "Back-translate" the amino acid sequence to determine what sequence of nucleotides in DNA could encode that peptide.
5. Search the genome database for an open reading frame (ORF) that contains that sequence.
6. Translate that ORF to get the entire amino acid sequence of your protein.
4.4 Study of Three-Dimensional (3D) Structure of a Protein
The clearest picture of how different proteins interact with one another to form functional complexes will come from determining the 3D structure of the complex. There are two methods:
• X-ray crystallography;
• nuclear magnetic resonance (NMR) spectroscopy.
X-ray crystallography requires that you be able to crystallize the protein. This is often a difficult task and especially difficult for complexes of two or more proteins.
Although in both cases the proteins are binding to DNA, they are also binding to each other (as homodimers).
NMR spectroscopy has been especially useful in producing 3D images of proteins that cannot be crystallized.
5. Practical applications of proteomics
One of the most promising developments to come from the study of human genes and proteins has been the identification of potential new drugs for the treatment of disease. This relies on genome and proteome information to identify proteins associated with a disease, which computer software can then use as targets for new drugs. For example, if a certain protein is implicated in a disease, its 3D structure provides the information to design drugs to interfere with the action of the protein. A molecule that fits the active site of an enzyme, but cannot be released by the enzyme, will inactivate the enzyme. This is the basis of new drug-discovery tools, which aim to find new drugs to inactivate proteins involved in disease. As genetic differences among individuals are found, researchers expect to use these techniques to develop personalized drugs that are more effective for the individual.
A computer technique which attempts to fit millions of small molecules to the three-dimensional structure of a protein is called "virtual ligand screening". The computer rates the quality of the fit to various sites in the protein, with the goal of either enhancing or disabling the function of the protein, depending on its function in the cell. A good example of this is the identification of new drugs to target and inactivate the HIV-1 protease. The HIV-1 protease is an enzyme that cleaves a very large HIV protein into smaller, functional proteins. The virus cannot survive without this enzyme; therefore, it is one of the most effective protein targets for killing HIV.
5.1 Biomarkers
Understanding the proteome, the structure and function of each protein and the complexities of protein-protein interactions will be critical for developing the most effective diagnostic techniques and disease treatments in the future.
An interesting use of proteomics is using specific protein biomarkers to diagnose disease. A number of techniques allow to test for proteins produced during a particular disease, which helps to diagnose the disease quickly. Techniques include western blot, immunohistochemical staining, enzyme linked immunosorbent assay (ELISA) or mass spectrometry.