Friday, April 22, 2011

DNA STRUCTURE

DNA Is Composed of Polynucleotide Chains:

Schematic model of the double
helix
         The most importnt featur of DNA is that it is usually composed of two polynucleotde chains twistd around each other in the form of a double helix. The upper part presents the structure of the double helix shown in a schematc form. Note that if inverted 180°, the double helix looks superficially the same, due to the complementary nature of the two DNA strands. The space-filling model of the double helix, in the lower part, shows the components of the DNA molecule and their relative positions in the helical structure. The backbone of each strand of the helix is composed of alternating sugar and phosphate residues; the bases project inward but are accessible through the major and minor grooves.

Whole Genome Shotgun Sequencing (WGS)

In contrast to the hierarchical BAC by BAC approach, which relies on the availability of genetic and physical maps for success, WGS is based on the strategy of sequencing a vast number of random genomic clones followed by intensive computer-based analysis of the DNA sequences which identifies matching sequences in different clones. This permits the assembly of a chromosomal DNA sequence, in principle without other map resources. As with the HS approach, overlapping clones are required, but since the clones are destined for direct sequence analysis, only vectors that contain small to medium inserts are normally used. Hence once the overlaps have been identified, the entire sequence is assembled. WGS was the approach adopted by the privately funded human genome initiative.
Although WGS remains somewhat controversial for sequencing complex genomes of ‘higher’ organisms because of the problems associated with repeat sequences and heterozygosity, it is a widely used approach. The number of complex genomes sequenced by this WGS is increasing and includes the fruit fly Drosophila, mosquito (anopheles), mouse, puffer fish, dog and grapevine. However, in some cases, such as the silk worm genome project, the WGS method has resulted in many seemingly irresolvable gaps in the genome and so the BAC-based hierarchical ordering of clones was used to close the gaps.Advances in computational analysis of WGS sequences suggest that the problems caused by repeat sequences could be overcome, hence the approach can be expected to gain more ground in future genome projects.

Mapping and Sequencing Strategies

The Human Genome Project aimed to produce four types of map: physical, genetic, DNA sequence and gene. Physical and genetic maps provide essential anchor points and frameworks to align DNA sequences and assign genes. A high-resolution physical map based on the analysis of overlapping DNA clones represents the actual distance in DNA base pairs between genetic markers and other landmarks. However, the ultimate physical map is the DNA sequence itself. Low-resolution physical maps are generated from techniques such as somatic cell hybridisation and fluorescence in situ hybridisation; these methods are also applicable for assigning genes to chromosomes.
There are two approaches to genome sequencing: whole genome shotgun sequencing (WGS) and the more labour-intensive hierarchical shotgun sequencing (HS). In simple organisms such as bacteria and viruses, where the chromosomes are haploid and very little repeat sequence occurs, or for sequencing individual human genes, WGS works well. In contrast, for eukaryotic genomes, where repeat sequences often abound, including the human genome (>50% repeats), and there is considerable heterozygosity, it has been argued that HS offers advantages over WGS, and this was the approach adopted by the publicly funded International Human Genome Sequencing Consortium.

Mapping and Identifying Genes

There are two main mapping approaches to identifying genes on chromosomes: genetic mapping and physical mapping. Genetic mapping relies on observing the recombination frequencies between pairs of polymorphic markers segregating within families. A genetic map is constructed by analysing several pairs of polymorphic genetic markers. The genetic distance between the markers is determined by their recombination frequency and measured in centimorgans (cM), where 1cM equates to 1% recombination, rather than number of nucleotides. For any pair of marker loci, the smaller the recombination frequency, the shorter is the genetic distance between the markers. The process of constructing a genetic map is called linkage analysis; this aims to determine whether or not a pair of gene loci tend to be co-inherited or separated by recombination. At one time, linkage analysis in humans was a slow process because of the dearth of polymorphic markers.However, the discovery of variable numbers of tandem repeat polymorphisms (VNTRs) and the production of genome wide panels of polymorphic microsatellites in the 1990s rendered linkage analysis one of the major tools in the Human Genome Project. Additionally, a number of physical mapping techniques were developed that enabled
genes to be identified and localised purely on the basis of their physical positions along the chromosomes; this became known as ‘reverse genetics’ and later ‘positional cloning’.

Pyrosequencing

Rapid PCR sequencing has also been made possible by the use of pyrosequencing, which can be regarded as a second-generation sequencing method without the need for cloning in E. coli or any host cell. In one format of pyrosequencing, a PCR template is hybridised to an oligonucleotide and incubated with DNA polymerase, ATP sulfurylase, luciferase and apyrase. During the reaction, the first of the four dNTPs are added  and, if incorporated, release pyrophosphate (PPi), hence the name ‘pyrosequencing’. The ATP sulfurylase converts the PPi to ATP, which drives the luciferase-mediated conversion of luciferin to oxyluciferin to generate light. Apyrase degrades the resulting component dNTPs and ATP. This is followed by another round of dNTP addition. A resulting pyrogram provides an output of the sequence. The method provides short reads very quickly and is especially useful for the determination of mutations or discovery of single nucleotide polymorphisms. Another pyrosequencing format involves direct analysis of DNA fragments and this system allows the rapid sequencing of entire genomes by the ‘shotgun’ approach. First genomic DNA is randomly sheared and ligated to linker sequences that permit individual molecules captured on the surface of a bead to be amplified while isolated within an emulsion droplet. A very large collection of such beads is arrayed in the 1.6 million wells of a fibre-optic slide. The micro-array is presented sequentially with each of the four dNTPs and the amount of incorporation is monitored by luminometric detection as before. The second-generation Roche 454 Genome Sequencer FLX is reportedly able to produce 100 Mb of sequence with 99.5% accuracy for individual reads averaging over 250 bases in length. Once again, the derived sequences can be downloaded automatically to databases and manipulated using a variety of bioinformatics resources.

Automated DNA Sequencing

Advances in fluorescent labelling chemistry have led to the development of high-throughput automated sequencing techniques. Essentially, most systems involve the use of dideoxynucleotides labelled with different fluorochromes (often referred to as dye terminators). The advantage of this modification is that since a different label is incorporated with each ddNTP, it is unnecessary to perform four separate reactions. Therefore, the four chain-terminated products are run on the same track of a denaturing electrophoresis gel. Each product with its base-specific dye is excited by a laser and the dye then emits light at its characteristic wavelength. A diffraction grating separates the emissions which are detected by a chargecoupled device (CCD) and the sequence is interpreted by a computer. The advantages of the techniques include real-time detection of the sequence. In addition, the lengths of sequence that may be analysed are in excess of 500 bp. Capillary electrophoresis is increasingly being used for the detection of sequencing products. This is where liquid polymers in thin capillary tubes are used, obviating the need to pour sequencing gels and requiring little manual operation. This substantially reduces the electrophoresis run times and allows high throughput to be achieved. A number of large-scale sequence facilities are now fully automated, allowing the rapid acquisition of sequence data. Automated sequencing for genome projects is usually based on cycle sequencing using instruments such as the ABI PRISM 3700 DNA Analyzer. This can be formatted to produce simultaneous reads from 384-well cycle sequencing reaction plates. The derived nt sequences are downloaded automatically to databases and manipulated using a variety of bioinformatics resources.

PCR Cycle SequencingPCR Cycle Sequencing

One of the most useful methods of sequencing PCR amplicons is termed PCR cycle sequencing. This is not strictly a PCR, since it involves linear amplification with a single primer. Approximately 20 cycles of denaturation, annealing and extension take place. Radiolabelled or fluorescently labelled dideoxynucleotides are then introduced in the final stages of the reaction to generate the chain-terminated extension products. Automated direct PCR sequencing is increasingly being refined, allowing greater lengths of DNA to be analysed in a single sequencing run

Sequencing Double-stranded DNA

It is also possible to undertake direct DNA sequencing from doublestranded molecules such as plasmid cloning vectors and PCR amplicons. The double-stranded DNA must be denatured prior to annealing with
primer. In the case of plasmids, an alkaline denaturation step is sufficient. However, for PCR amplicons this is more problematic. Unlike plasmids, amplicons are short and reanneal rapidly. Denaturants such as formamide and dimethyl sulfoxide have been used to prevent the reannealing of PCR strands following their separation. Another strategy is to bias the amplification towards one strand by using a primer ratio of 100:1, which also overcomes this problem to a certain extent.
It is possible physically to separate and retain one PCR product strand by incorporating a molecule such as biotin into one of the primers. Following PCR, the strand that contains the biotinylated primer may be removed by affinity chromatography with streptavidin-coated magnetic beads, leaving the complementary PCR strand. This magnetic affinity purification provides single-stranded DNA derived from the PCR amplicon and, although somewhat time consuming, it does provide high-quality single-stranded DNA for sequencing.

Dideoxynucleotide Chain Terminators

The reaction mixture is then divided into four aliquots, representing the four dNTPs, A, C, G and T. Using the adenine (A) tube as an example, in addition to all of the dNTPs being present in the mix, an analogue of dATP is added [2',3'-dideoxyadenosine triphosphate (ddATP)], which is similar to A except that it has no 30-hydroxyl group. Since a 5' to 3' phosphodiester linkage cannot be formed without a 3'-hydroxyl group, the presence of the ddATP will terminate the growing chain. The situation for tube C is identical except that ddCTP is added; similarly, the G and T tubes contain ddGTP and ddTTP, respectively.
Since the incorporation of a ddNTP rather than a dNTP is a random event, the reaction will produce new molecules varying widely in length, but all terminating at the same type of base. Thus four sets of DNA sequence are generated, each terminating at a different type of base, but all having a common 5' end (the primer). The four labelled and chainterminated samples are then denatured by heating and loaded next to each other on a polyacrylamide gel for electrophoresis. Electrophoresis is performed at approximately 70 C in the presence of urea, to prevent renaturation of the DNA, since even partial renaturation alters the rates of migration of DNA fragments. Very thin, long electrophoresis gels are used for maximum resolution over a wide range of fragment lengths. After electrophoresis, the positions of radioactive DNA bands on the gel are determined by autoradiography. Since every band in the lane from the ddATP sample must contain molecules which terminate at adenine and that those in the ddCTP terminate at cytosine, etc., it is possible to read the sequence of the newly synthesised strand from the autoradiogram. Under ideal conditions, sequences of approximately 300 DNA bases can be read from one gel.

DNA SEQUENCING

The determination of the order or sequence of nucleotide bases along a length of DNA is one of the central techniques in molecular biology and has played the key role in genome mapping and sequencing projects.
Two basic techniques have been developed for efficient DNA sequencing, one based on an enzymatic method frequently termed Sanger sequencing, after its developer, and a chemical method, Maxam and Gilbert sequencing, named for the same reason. For large-scale DNA analysis, Sanger sequencing and its variants are by far the most effective methods and many commercial kits are available for its use. However, there are certain occasions, such as the sequencing of short oligonucleotides, where the Maxam and Gilbert method is still more appropriate.
One absolute requirement for Sanger sequencing is that the DNA to be sequenced is in a single-stranded form. Traditionally this demanded that the DNA fragment of interest be cloned into the specialised bacteriophage vector M13, which is naturally single stranded. Although M13 is still widely used, the advent of the PCR has provided a rapid means to amplify a region of any genome or cDNA for which primer sequences are available and generate the corresponding nucleotide sequence. This has led to an explosion in DNA sequence information and has provided much impetus for polymorphism discovery by  resequencing regions of the genome from individuals.
The Sanger method is simple and elegant and in many ways mimics the natural ability of DNA polymerase to extend a growing nucleotide chain based on an existing template. Initially the DNA to be sequenced is allowed to hybridise with an oligonucleotide primer, which is complementary to a sequence adjacent to the 3' side of DNA within a vector such as M13 (or within an amplicon in the case of PCR). The oligonucleotide will then act as a primer for synthesis of a second strand of DNA, catalysed by DNA polymerase. Since the new strand is synthesised from its 5' end, virtually the first DNA to be made will be complementary to the DNA to be sequenced. One of the deoxyribonucleoside triphosphates (dNTPs) which must be provided for DNA synthesis is radioactively labelled with P or S and so the newly synthesised strand will be radiolabelled.

Reverse Transcriptase PCR (RT-PCR)

RT-PCR is an extremely useful variation of the standard PCR which permits the amplification of specific mRNA transcripts from very small biological samples without the need for the rigorous extraction procedures associated with mRNA purification for conventional cloning purposes. Conveniently, the dNTPs, buffer, Taq polymerase, oligonucleotide primers, reverse transcriptase (RT) and the RNA template are added together to the reaction tube. The reaction is heated to 37 C which allows the RT to work and permits the production of a cDNA copy of the RNA strands that anneal to one of the primers in the mix. Following ‘first strand synthesis’, a normal PCR is carried out to amplify the cDNA product, resulting in ‘second strand synthesis’, and subsequently a dsDNA product is amplified as usual. The choice of primer for the first strand synthesis depends on the experiment. If amplification of all mRNAs in the cell extract is required, then an oligo dT primer that would anneal to all the polyA tails can be used. If a specific cDNA is sought, then a coding region-specific primer can be used with success, otherwise a random primer could be used. The method is fast, accurate and simple to perform. It has many applications, such as the assessment of transcript levels in different cells and tissues (when combined with Q-PCR). When combined with allele-specific primers, it also  allows the amplification of cDNA from single chromosomes. RT-PCR is widely used as a diagnostic tool in microbiology and virology.

PCR Primer Design and Bioinformatics

The specificity of the PCR lies in the design of the two oligonucleotide primers. These not only have to be complementary to sequences flanking the target DNA but also must not be self-complementary or bind each other to form dimers, since both prevent authentic DNA amplification. They also have to be matched in their GC content and have similar annealing temperatures and be incapable of amplifying unwanted genomic sequences. Manual design of primers is time consuming and often hit or miss, although equations such as the following are still used to derive the annealing temperature (Ta) for each primer:
                 4(G + C) + 2(A + T)  =  Tm
where Tm is the melting temperature of the primer/target duplex and G, C, A and T are the numbers of the respective bases in the primer. In general, Ta is set 3–5 1C lower than Tm. On occasions, secondary or primer dimer bands may be observed on the electrophoresis gel in addition to the authentic PCR product. In such situations, Touchdown or Hot start regimes may help. Alternatively, raising Ta closer to Tm can enhance the specificity of the reaction.
The increasing use of bioinformatics resources such as Oligo, Generunner and Primer Design Assistant16 in the design of primers makes the design and the selection of reaction conditions much more straightforward. These computer-based resources allow the sequences to be amplified, primer length, product size, GC content, etc., to be input and following analysis, provide a choice of matched primer sequences. Indeed, the initial selection and design of primers without the aid of bioinformatics would now be unnecessarily time consuming. Finally, before ordering or synthesising the primers, it is wise to submit proposed sequences to a nucleotide sequence search program such as BLAST, which can be used to interrogate GenBank or other comprehensive public DNA sequence databases to increase confidence that the reaction will be specific for the intended target sequence.

THE POLYMERASE CHAIN REACTION

In some respects, the PCR can be regarded as a form of molecular cloning, since it is a technique analogous to the DNA replication process that takes place in cells and the outcome is the same, namely the generation of new DNA molecules based exactly upon the sequence of the existing ones. PCR is a laboratory technique that is currently a mainstay of molecular biology. One of the reasons for the global adoption of the PCR is the elegant simplicity of the reaction and relative ease of the practical manipulation steps. Indeed, combined with the relevant bioinformatics resources for the design of oligonucleotide primers and for determination of the required experimental conditions, it provides a rapid means for DNA identification and analysis.
One problem with early PCR reactions was that the temperature needed to denature the DNA also denatured the DNA polymerase. However, the availability of a thermostable DNA polymerase enzyme isolated from the thermophilic bacterium Thermus aquaticus, found in hot springs, provided the means to automate the reaction. Taq DNA polymerase has a temperature optimum of 72 C and survives prolonged exposure to temperatures as high as 96 C and so is still active after each of the denaturation steps.
The PCR is often used to amplify a fragment of DNA from a complex mixture of starting material usually termed the template DNA. However, in contrast to conventional cell-based cloning, PCR does require knowledge of the DNA sequences which flank the fragment of DNA to be amplified (target DNA). From this sequence information, two oligonucleotide primers are chemically synthesised, each complementary to a stretch of DNA to the 3' side of the target DNA, one oligonucleotide for each of the two DNA strands. For many applications PCR has replaced the traditional DNA cloning methods as it fulfils the same function, the production of large amounts of DNA from limited starting material; however, this is achieved in a fraction of the time needed to clone a DNA fragment. Although not without its drawbacks, the PCR is a remarkable development which has changed the approach of many scientists to the analysis of nucleic acids and continues  to have a profound impact on core genomic and genetic analysis.

cDNA libraries

cDNA libraries

Detection of recombinant clones

Detection of recombinant clones

DNA Libraries

‘DNA library’ is the term used to describe a collection of recombinant clones or DNA molecules generated from a specific source of DNA. There are two main types of DNA library which are very distinct in their origin and purpose. DNA from a nucleated cell, whatever the tissue source, from a specific organism is used to make a ‘genomic’ library. The idea of a general genomic library is to produce a set of clones that contain enough DNA fragments so that the entire genome of the organism is represented. Variations of genomic DNA libraries such as chromosome-specific libraries that were prepared from chromosomes sorted by flow cytometry10 were employed in the Human Genome Project in an attempt to shorten the path between the starting DNA and generation of the genome map. The second type is the cDNA library, which is made from  mRNA that has been reverse transcribed by the enzyme reverse transcriptase. Reverse transcriptase produces complementary DNA (or cDNA) fragments which are then cloned into a vector. Therefore, unlike a genomic DNA library, a cDNA library is representative of the expressed genes in a particular cell or tissue type. Thus a skeletal muscle cDNA library contains sequences expressed in the muscle tissue at the time the mRNA was harvested. cDNA libraries are particularly useful for cloning sequences where there is biological information. For example, it is known that mammalian skeletal muscle produces high levels of  phosphoglucomutase (PGM1) enzyme activity, hence PGM1 cDNA clones are expected (and found) to be well represented in skeletal muscle cDNA libraries.

Identifying Clones

Bacterial clones containing the sought after recombinant vectors can be identified by hybridisation with specific radioactively labelled or enzyme-labelled cDNA or genomic DNA probes or alternatively by the immunodetection of protein products (using specialised expression vectors which allow a cloned foreign cDNA to be transcribed to express its protein product). Both approaches are technically straightforward. Both involve the transfer of bacterial colonies from a master agar plate on to carefully orientated nitrocellulose or nylon membranes. The cells are then lysed and the DNA (or protein) from the lysed colonies is immobilised on the membrane, which is used for the probing step. Recombinant colonies can be detected as spots on X-ray film either by autoradiography or enzyme-generated chemiluminescence. The spots on the X-ray film can then be aligned with the agar master plate allowing the correct colonies to be picked. For protein detection in expression vectors, antibody probes are employed and in a manner analogous to an ELISA test. The antibody probe is conjugated to an enzyme such as horseradish peroxidase or alkaline phosphatase. It is the activity of the bound enzyme on its chemiluminescence or chromogenic substrate that reveals the position of the recombinant colonies.

Library Screening

There are several methods of screening for transformed colonies that contain recombinant vectors. For example, where the cloning site in the vector lies within an antibiotic resistance gene, successful integration of the insert will lead to inactivation of the resistance gene and recombinant colonies can be identified by a technique known as replica plating. In this method, the pattern of colonies in the original Petri dish is printed on to a nutrient agar plate containing the selective antibiotic. The position of the recombinant colonies, i.e. those that fail to grow on the selective antibiotic, is noted so that they can be picked from the master plate. In other vectors, a method called blue/white selection can be performed. In blue/white selection, successful integration of a foreign DNA molecule in the vector destroys an enzyme gene (the LacZ gene of b-galactosidase) that otherwise forms a blue product when the transformed colonies are exposed to the substrate X-gal. Thus recombinant colonies are white and non-recombinants are blue.
Replica plating to detect recombinant plasmids

The Cloning Process

The recombinant vector molecule must be introduced into its ‘matching’ host cell in order to replicate and produce multiple copies. The process by which DNA is introduced into the host cell is known as bacterial transformation. Since ‘naked’ vector DNA is hydrophilic and the bacterial cell wall is normally impermeable to such molecules, the host cell must be made ‘competent’ by treatment with calcium chloride in the early log phase of growth. This causes the cell to become permeable to chloride ions. When competent cells are mixed with DNA and heat shocked at 42 1C, the swollen cells are able to take up the naked DNA molecules. It is believed that only a single DNA molecule is permitted to enter any single cell. Thus individual colonies of transformed bacteria grow single recombinant vector molecules. The bacterial cells are usually grown on selective media so that only transformants survive to form colonies. Thus, for example, if the vector contains an ampicillin resistance gene and the cells are grown on ampicillin-containing media, only the cells containing the vector will form  colonies.

Genetic Code Table

Genetic Code

Genes and Proteins

Cregor Mendel might have been surprised to learn that most genes contain nothing more than instructiccss for aseesnbling proteins. He might have asked what proteine could possibly have to  do with the color of a flower, the shape ofa leaf, a human blood type, or the sex of a newborn baby. The answer is that proteins have everything to do with theee things. Remember that many proteins are enzymes, which catalyze and regulate chemical reactions. A gene that codes for an enzyme to produce pigment can control the color of a flower. Another gene produces an enzyme specialized Ibr the production of red blood cell surface antigen. This molecule determines your blood type. Genes for certain proteins can regulate the rate and pattern of growth throughout an organism, controlling its size and shape. In short, proteins are microscopic tools, each specifically designed to build or operate a component of a living cell.
Genes and Proteins

The Roles of RNA and DNA

You can compare the different rolee played by DNA and RNA molecules in directing protein synthesis to the two types of plans used by builders. A master plan has all the information needed to construct a building. But builders never bring the valuable master plan to the building site, where it might be damaged or lost. Instead, they prepare inexpensive, disposable copies of the master plan called blueprints. The master plan is safely stored in an office. and the blueprints are taken to the job sita Similarly, the cell uses the vita] DNA “master plsn to prepare RNA bluepnnts. The DNA molecule remains within the safety of the nucleus, while RNA molecules go to the protein-building sites in the cytoplasm the ribosomes.

RNA Editing

Like a writer’s first draft, many RNA molecules require a bit of editing before they are ready to go into action. Remember that an RNA molecule is produced by copying DNA. Surprisingly, the DNA of eukaryotic genes contains sequences of nucteotidea, called Introns, that are not involved in coding for proteins. The DNA sequences that code for proteins are called exons because they are ‘expresaeC in the synthesis of proteins. When RNA molecules are formed, both the introns and the emns are copied from the DNA. However, the introna are cut out of RNA molecules while they are still in the nucleus. The remaining exona are then spliced back together to form the final mRNA. Why do ceLls use energy to make a large RNA molecule and then throw parts of it away? That’s a good question, and biologists still do not have a cmplete aoawer to it Some RNA molecules may he cut and spliced in different ways in different tissues. making it poasible for a single gene to produce several different forms of RNA. Introna and exoos may aLso play a role in evolut ion. This would make it possible for very small changes in DNA aequences to have dramatic effects in geoe expression.
RNA Editing

Sunday, April 17, 2011

DNA TOPOLOGY

        As DNA is a flexibl structur its exact molecular paramete are function of both surrounding ionic environment and nature of the DNAbinding protens with which it is complexd. But if the two ends are covalently linked to form a circular DNA molecule and if there are no interruptions in the sugar phosphate backbones of the two strands, then the absolute number of times the chains can twist about each other cannot change. Because their ends are free, linear DNA molecules can freely rotate to accommodate changes in the number of times the two chains of the double helix twist about each other. Such a covalently closed, circular DNA is said to be topologically constrained. Despite these constraints, DNA participates in numerous dynamic processes in the cell. For example, the two strands of the double helix, which are twisted around each other, must rapidly separate in order for DNA to be duplicated and to be transcribed into RNA. Thus, understanding the topology of DNA and how the cell both accommodates and exploits topological constraints during DNA replication, transcription, and other chromosomal transactions is of fundamental importance in molecular biology.

RNA STRUCTURE

 We now turn our attention to RNA, which differs from DNA in three respects. First, the backbone of RNA contains ribose rather than 2'-deoxyribose. That is, ribose has a hydroxyl group at the 2' position. Second, RNA contains uracil in place of thymine. Uracil has the same single-ringed structure as thymine, except that it lacks the 5' methyl group. Thymine is in effect 5'methyl-uracil. Third, RNA is usually found as a single polynucleotide chain. Except for the case of certain viruses, RNA is not the genetic material and does not need to be capable of serving as a template for its own replication. Rather, RNA functions as the intermediate, the mRNA, between the gene and the protein-synthesizing machinery. Another function of RNA is as an adaptor, the tRNA, between the codons in the mRNA and amino acids. RNA can also play a structural role as in the case of the RNA components of the ribosome. Yet another role for RNA is as a regulatory molecule, which through sequence complementarity binds to, and interferes with the translation of, certain mRNAs. Finally, some RNAs are enzymes that catalyze essential reactions in the cell. In all of these cases, the RNA is copied as a single strand off only one of the two strands of the DNA template, and its complementary strand does not exist. RNA is capable of forming long double helices, but these are unusual in nature.
RNA Structure

RNA Structure


Process of Protein Synthesis – translation

Translation – mRNA base sequence to amino acid sequence.
• A ribosome binds to the start point of the mRNA.
• The ribosome will ‘decode’ the mRNA in sets of three bases (a codon).
• Each codon specifies a particular amino acid.
• The sequence of bases on the mRNA determines the sequence of amino acids in the protein – any change = a mutation
• Two codons of the mRNA are exposed in turn.
• Two complementary tRNA molecules attach to these two mRNA triplets.
• The amino acids of the tRNA bond together (peptide bond – condensation reaction)
• The leading tRNA detaches from its amino acid and from the mRNA.
• The ribosome ‘moves’ to the next codon and another complementary tRNA attaches.
• The newly arrived complementary tRNA then adds a new amino acid.
• The process repeats, codon by codon, to the end of the mRNA (until a stop codon is reached).
• The amino acid sequence is now complete.
• The polypeptide (amino acid chain) folds giving the protein its normal functional shape.
• Primary structure = amino-acid sequence (determined by DNA sequence)
• Secondary structure = many H- bonds making
  • Alpha helix (very common) or
  • Beta-pleated sheet (rare – butterfly wings and silk)
  • Thus affected by pH, temperature
• Tertiary structure = disulphide bridges and further H-bonds – forms active sites
• Quaternary structure (rare) – only haemoglobin (Van der Waal’s forces)
Translation

Process of Protein Synthesis – transcription

Transcription – DNA base sequence to mRNA base sequence
• The ‘code’ for the protein is carried by one of the DNA strands in the gene.
• An enzyme separates the two DNA strands at the gene locus exposing the gene sequence.
• A complementary copy - mRNA - is made of the gene sequence –
  • New nucleotides form a complementary RNA strand
  • Using the DNA gene sequence strand as a ‘master’
  • The enzyme RNA polymerase links the new nucleotides forming mRNA.
• Uracil (U) is the complementary base to adenine in RNA, thymine (T) is not found in RNA.
• The complementary RNA copy is called messenger RNA (mRNA).
• The mRNA separates from the DNA strand and passes from the nucleus to the cytoplasm.

Ribosomal & Transfer RNA

Ribosomal RNA (rRNA) A ribosome is roughly 50% protein and 50% RNA (known as rRNA).
Transfer RNA (tRNA)
• tRNA is found in large amounts in the cytoplasm.
• Single stranded but folded back on itself with three exposed bases (‘anticodon’) at one end and a particular amino acid at the opposite end.
• tRNAs are ‘adapters’ linking amino acids to nucleic acids in protein synthesis.
• There are 64 (4 x 4 x 4) possible triplets;
• there are 61 tRNAs - the other 3 are ‘stop’ signals
• There are only 20 different amino-acids, so
  • Each amino-acid is coded for by more than one codon (tRNA);
  • Thus the code is degenerate (or ‘semi-redundant’)
Note: transcription occurs in the nucleus; translation occurs in the cytoplasm.
CYTOPLASM

Process of Protein Synthesis – transcription and translation

Transcription – DNA base sequence to mRNA base sequence
• The ‘code’ for the protein is carried by one of the DNA strands in the gene.
• An enzyme separates the two DNA strands at the gene locus exposing the gene sequence.
• A complementary copy - mRNA - is made of the gene sequence –
o new nucleotides form a complementary RNA strand
o using the DNA gene sequence strand as a ‘master’
o the enzyme RNA polymerase links the new nucleotides forming mRNA.
• Uracil (U) is the complementary base to adenine in RNA, thymine (T) is not found in RNA.
• The complementary RNA copy is called messenger RNA (mRNA).
• The mRNA separates from the DNA strand and passes from the nucleus to the cytoplasm.
Translation – mRNA base sequence to amino acid sequence.
• A ribosome binds to the start point of the mRNA.
• The ribosome will ‘decode’ the mRNA in sets of three bases (a codon).
• Each codon specifies a particular amino acid.
• The sequence of bases on the mRNA determines the sequence of amino acids in the protein – any change = a mutation
• Two codons of the mRNA are exposed in turn.
• Two complementary tRNA molecules attach to these two mRNA triplets.
• The amino acids of the tRNA bond together (peptide bond – condensation reaction)
• The leading tRNA detaches from its amino acid and from the mRNA.
• The ribosome ‘moves’ to the next codon and another complementary tRNA attaches.
• The newly arrived complementary tRNA then adds a new amino acid.
• The process repeats, codon by codon, to the end of the mRNA (until a stop codon is reached).
• The amino acid sequence is now complete.
• The polypeptide (amino acid chain) folds giving the protein its normal functional shape.
• Primary structure = amino-acid sequence (determined by DNA sequence)
• Secondary structure = many H- bonds making
  • Alpha helix (very common) or
  • Beta-pleated sheet (rare – butterfly wings and silk)
  • Thus affected by pH, temperature
• Tertiary structure = disulphide bridges and further H-bonds – forms active sites
• Quaternary structure (rare) – only haemoglobin (Van der Waal’s forces)

The Genetic Code

Proteins are made by joining amino acids into long chains called polypeptides Each polypeptide contains a cccnbinatias classy or all of the 20 different amino acids. The properties of proteins are detenoined by the order in which different amino acids are janed together to produce polypeptides. How you might wondes can a particular order ci nitrogenous bases in DNA arid RNA molecules he translated into a particular order of amino acids in a polypeptide? The language of mRNA instructions is called the genetic code. As you know RNA contains four different bases: A, U. C. and C. In effect, the code is written ins language that has only four ‘letters How can a code with just four letters cany instructions for 20 different amino acids? The genetic code is reed three letters at a time, so that each “wcrd of the coded message ia three bases long. Each three-letter “w,rt in mRNA is known as a codccs. A coders consists of three  consecutive nudeotides that specify a single amino acid that is to be added to the polypept ide. For example, consider the following RNA sequence:
                       UCGCACGGU
This sequence would be read three bases at a time as:
                       UCG.CAC-GGU
The codons represent the different amino acids:
                       UCG.CAC-GGU
                  Serine-Hlatidine-Glycine
Genetic Code

Protein Synthesis

Gene - A section of DNA containing a particular sequence of bases that codes for a specific protein.
Protein Synthesis - The transcription of a specific DNA base sequence into mRNA and its translation, by a  ribosome, into a particular amino acid sequence forming a protein.
Genetic Code
• The universal code that determines the function of all possible triplets of DNA / mRNA.
• Most triplets specify a particular amino acid (= a codon).
• Some triplets function as a start or stop signal for protein synthesis.
• It is a degenerate code as a particular amino acid may be coded for by more than one codon.
Protein Synthesis

DNA Replication

DNA Replication
This takes place during the S stage of interphase
• Nucleotides are synthesised in huge quantities in the cytoplasm.
• An enzyme unzips the two complementary strands of DNA.
• New complementary nucleotides link to the exposed bases on the separated strands.
• The general name for this group of enzymes is DNA polymerase.
• A new complementary strand is built along each ‘old’ strand.
• Two DNAs, identical to the original and each other, are now present.
• Each new DNA molecule is thus ‘half old’ and ‘half new’
  • ‘semi-conservative replication’.
 

Coding & Non-coding Structures of DNA

Coding structures (Exons)
• These are the parts of the DNA that contain the code for the synthesis of protein or RNA.
• These coding sequences are present within genes.
Non-coding Structures.
• This is DNA that does not contain information for the synthesis of protein or RNA.
• The non-coding sequences are found both between genes and within genes (= introns).
• These non-coding sequences have been termed ‘junk DNA’ but they:
  • Do play a role in gene expression (i.e. whether a gene is switched ‘on’ or ‘off’)
  • Act as spacer material,
  • Permit the synthesis of many new proteins and
  • Play an important role in evolution.
• Non-coding DNA makes up 95% of human DNA.
• Non-coding DNA segments within genes are called introns.

Differences between DNA and RNA

• DNA is double stranded; RNA is a single stranded
• N.B. ATP is also a nucleotide, with ribose as the pentose sugar.
• DNA contains the pentose sugar deoxyribose; RNA contains the pentose sugar ribose.
• DNA has the base Thymine (T) but not Uracil (U); RNA has U but not T.
• DNA is very long (billions of bases); RNA is smaller (hundreds to thousands of bases)
• DNA is self-replicating, RNA is copied from the DNA so it is not self-replicating
The genetic information is held within the base sequence along a DNA strand.
A codon is a sequence of three nucleotides, coding for one amino-acid.
The genetic code is universal, thus all life must have had a common ancestor (i.e. evolution)

RNA Protein synthesis

RNA (RNA = ribonucleic acid)
• Three different types of RNA, (messenger (mRNA), transfer (tRNA) ribosomal (rRNA))
• All are made in the nucleus (transcription)
  • ribosomes are synthesised in the nucleolus;
  • mRNA prepared there too – introns removed
• All types of RNA are involved in protein synthesis:
  • mRNA: copies the information from the DNA.
  • tRNA: carries the specific amino acid to the mRNA in contact with the ribosome.
  • rRNA: makes up 55% of ribosomes (the other 45% = protein).

DNA Protein synthesis

DNA (DNA = deoxyribonucleic acid)
• DNA is the genetic material of all living cells and of many viruses.
• DNA is: an alpha double helix of two polynucleotide strands.
• The genetic code is the sequence of bases on one of the strands.
• A gene is a specific sequence of bases which has the information for a particular protein.
• DNA is self-replicating - it can make an identical copy of itself.
• Replication allows the genetic information to pass faithfully to the next generation.
• Replication occurs during the ‘S’ (= synthesis) stage of interphase just before nuclear division.
• The chromosomes contain 90% of the cell’s DNA.
• 10% is present in mitochondria and chloroplasts.
• Adenine (A) and Guanine (G) are purine bases
• Thymine (T) and Cytosine (C) are pyrimidine bases
• Hydrogen bonds link the complementary base pairs:
  • Two between A and T (A = T)
  • Three between G and C (G ≡ C)
• A single unit in the chain is a nucleotide.
  • This consists of a phosphate group,
  • A pentose sugar (D = DNA; R = RNA) and
  • An organic base (ATGC = DNA; AUGC = RNA)
DNA

Tuesday, April 12, 2011

DNA structure: Revisiting the Watson–Crickdouble helix

Watson and Crick’s postulated a double helical structure for DNA, heralded a revolution in our understanding of biology at
the molecular level.
 
THE righthanded doubl helical structure proposed In 1953 by WatsonCrick for deoxyribos nucleic acid is the most wellrecognized structure for this polymeric molecule1. WhileWatsonCrick were undoubtedly the first to propose an essentially correct model for DNA structure, a wide varietyof available data was used by them to arrive at this ‘canonical’ model for DNA, in particular the nucleotidebase composition data of Chargaff and informationfrom the X-ray fibre diffraction pattern  ofB-form DNA, as recorded by Rosalind Franklin2. It wascommonly believed for several decades, that this B-formis the only structure of DNA that has biological relevance,even though Rosalind Franklin’s fibre diffractiondata2,3 for A and B forms had clearly shownthat the DNA molecule could readily undergo structuraltransitions depending on the environment, viz. variationin relative humidity in this case. Fibre diffraction studiesin the sixties and seventies also revealed several otherforms of DNA structure for synthetic oligo- and polynucleotides,depending on the base sequence and environment.Subsequent biochemical and structural studiesshowed that regions of genomic DNA, under variousphysiological conditions, can assume different structures,particularly when some well-defined sequence motifs orrepeats occur. It was probably the characterization of suchsequence repeats as ‘junk DNA’ by Francis Crick4 thatput a damper on the study of such sequences till recently.

Polynucleotide Chains

In terms of biochemistry, a DNA strand is a polymer—a large molecule built from repeating units. The units in DNA are composed of 2'-deoxyribose (a five-carbon sugar), phosphoric acid, and the four nitrogen- containing bases denoted A, T, G, and C. The chemical structures of the bases are shown in Figure 2.3. Note that two of the bases have a double-ring structure; these are called purines. The other two bases have a single-ring structure; these are called pyrimidines.
• The purine bases are adenine (A) and guanine (G).
• The pyrimidine bases are thymine (T) and cytosine (C).
Pyrimidines
Purines

Nucleoside & Nucleotide

The Molecular Structure of DNA

Modern experimental methods for the manipulation and analysis of DNA grew out of a detailed understanding of its molecular structure and replication. Therefore, to understand these methods, one needs to know something about the molecular structure of DNA. We saw in Chapter 1 that DNA is a helix of two paired, complementary strands, each composed of an ordered string of nucleotides, each bearing one of the bases A (adenine), T (thymine), G (guanine), or cytosine (C). WatsonCrick base pairing between A and T and between G and C in the complementary strands holds the strands together. The complementary strands also hold the key to replication, because each strand can serve as a template for the synthesis of a new complementary strand. We will now take a closer look at DNA structure and at the key features of its replication.

Double Helix RNA Chains

Despite being single-stranded, RNA molecules often exhibit a great deal of double-helical character. This is because RNA chains frequently fold back on themselves to form base-paired segments between short stretches of complementary sequences. If the two stretches of complementary sequence are near each other, the RNA may adopt one of various stem-loop structures in which the intervening RNA is looped out from the end of the double-helical segment as in a hairpin, a bulge, or a simple loop.
Double Helical
Characteristics of RNA
       In an RNA molecule having regions of complementary sequences, the intervening stretches of RNA may become “looped out” to form one of the structures illustrated in the figure. (a) Hairpin (b) Bulge (c) Loop
       The stability of such stem-loop structures is in some instances enhanced by the special properties of the loop. For example, a stem-loop  with the “tetraloop” sequence UUCG is unexpectedly stable due to special base-stacking interactions in the loop. Base pairing can also take place between sequences that are not contiguous to form complex structures aptly named pseudoknots. The regions of base pairing in RNA can be a regular double helix or they can contain discontinuities, such as noncomplementary nucleotides that bulge out from the helix.

Twist and Writhe DNA

The linking number is sum of two geometric components called twist and writhe. Let twist first. Twist is simply the number of helical turns of one strand about other, the number of times one strand completely wraps around the other strand. Consider a cccDNA that is lying flat on a plane. In this flat conformation, the linking number is fully composed of twist. Indeed, the twist can be easily determined by counting the number of times the two strands cross each other. The helical crossovers (twist) in a right-handed helix are defined as positive such that the linking number of DNA will have a positive value.
But cccDNA is generally not lying flat on plane. Rather, it is usually torsionally stressed such that the long axis of the double helix crosses over itself, often repeatedly, in three-dimensional space. This is called writhe. To visualize the distortions caused by torsional stress, think of the coiling of a telephone cord that has been overtwisted. Writhe can take two forms. One form is the interwound or plectonemic writhe, in which the long axis is twisted around itself. The other form of writhe is a toroid or spiral in which the long axis is wound in a cylindrical manner, as often occurs when DNA wraps around protein. The writhing number (Wr) is the total number of interwound and/or spiral writhes in cccDNA. For example, the molecule shown in  has a writhe of 4 from 4 interwound writhes.Interwound writhe and spiral writhe are  topologically equivalent to each other and are readily interconvertible geometric properties of cccDNA. Also, twist and writhe are interconvertible. A molecule of cccDNA can readily undergo distortions that convert some of its twist  to writhe or some of its writhe to twist without the breakage of any covalent bonds. The only constraint is that the sum of the twist number (Tw) and the writhing number (Wr) must remain equal to the linking number (Lk). This constraint is described by the equation: Lk = Tw + Wr.

Holliday junction structure

A Holliday junction structure for a decamer with the inverted repeat sequence d(CCGGTACCGG) with one duplex in cyan and the other in yellow. The strand exchange between the two duplexes occurs at the AC step, with very little disruption of the base pairs.

G-quadruplex structure

A G-quadruplex structure, formed by an association of two hairpins and a parallel G-quadruplex with TTA loops. In both structures, the guanine tetrads are shown in cyan, with the loops shown in yellow.

DNA triple helix

A model structure for a DNA triple helix with C.G.G. triplets, along with a triple helix structure reported from NMR, which contains a mixture of T.A.T. and C.G.G. triplets. The WatsonCrick duplex is shown in cyan, while the third strand is shown in yellow, with ribbons tracing the backbone.

crystal structure

Representative crystal structures for A-, B- and Z-DNA. The nucleotides are colourcoded (cytosine in orange, guanine in cyan, thymine in green, and adenine in yellow) and a ribbon is superposed on the backbones, connecting the phosphorus atoms. A and B-DNA are both right handed, nearly uniform double helical structures, while Z-DN.



Covalently Closed, Circular DNA

Let consider topologicl properties of covalently closed, circular DNA, which is referred as cccDNA. Because there are no interruptions in either polynucleotide chain, the two strands of cccDNA cannot separated from each other without breaking of covalent bond. If we wished to separate two circular strands without permanently breaking bonds in the sugar phosphate backbones, we would have to pass one strand through ther strand repeatedly. The number of times one strand would have to be passed through the other strand in order for the two strands to be entirely separated from each other is called the linking number. The linking number, which is always an integer, is an invariant topological property of cccDNA, no matter how much the DNA molecule is distorted.
Topological States of Covalently Closed Circular (ccc) DNA


Topological States of Covalently Closed Circular (ccc)  DNA. The figure shows conversion of the relaxed (a) to  the negatively supercoiled (b) form of DNA. The strain in the  supercoiled form may be taken up by supertwisting (b) or  by local disruption of base pairing (c). [Adapted from a diagram provided by  Dr. M. Gellert.] (Source: Modified from Kornberg, A.  and  Baker, T. A. 1992. DNA Replication. Figure 1-21, page 32)

Monday, April 11, 2011

Dependence of DNA Denaturation on G.C Content and on Salt Concentration

Dependence of DNA Denaturation
The greater the G.C content, the higher the temperature must be to denature the DNA strand. DNA from different sources was dissolved in solutions of low (red line) and high (green line) concentrations of salt at pH 7.0. The points represent the temperature at which the DNA denatured, graphed against the G.C content.

DNA Denaturation Curve

         The melting temperatur of DNA is a characteristc of each DNA that is determined by the G:C content of the DNA and the ionic of the solution. The higher the percent of G:C base pairs in the DNA , the higher the melting point.

DNA Denaturation Curve.
        Likewise,  the salt concentration of the solution greater the temperature at which the DNA denatures. G:C base pairs contribute more to the stability of DNA than do A:T base pairs because of the greater number of hydrogen bonds for the former but also importantly because the stacking interactions of G:C base pairs with adjacent base pairs are favorable than the corresponding interactions of A:T base pairs with their neighboring base pairs. The effect of ionic strength reflects another fundamental feature of the double helix. The backbones of the two DNA strands contain phosphoryl groups, which carry a negative charge. These negative charges are close enough across the two strands that if not shielded they tend to cause the strands to repel each other, facilitating their separation. At high ionic strength, the negative charges are shielded by cations, thereby stabilizing the helix. Conversely, at low ionic strength the unshielded negative charges render the helix less stable.