There is significant advantage to understanding these processes, too. We may find cures for genetic diseases or design new drugs on the computer, rather than experimentally. Genetic engineering holds promise for developing new strains of agricultural plants to help feed an exploding world population. Fortunately for you, you are not expected to solve the world's problems in CS1. We do want to give you enough background, however, to understand the basics of the field.
ACGGGAGGACGGGAAAATTACTACGGATTAGC
A real DNA molecule can contain hundreds of thousands to millions of nucleotides.
A gene is a small portion of a DNA molecule that contains information about how to make one protein. Variations in genes are what make us different from each other. One version of a gene may give a person blond hair, while another version will give a person black hair. A single DNA molecule usually contains many genes.
About 50 years ago, it was discovered that it takes two strands of DNA to make
a molecule. They are joined together in a spiral staircase shape called
a double helix. The account of this discovery is recorded in the book,
The Double Helix: A Personal Account of the Discovery of the Structure of DNA
by James D. Watson and Lawrence Bragg.
This book is a very good read and is highly recommended by your instructors.
(It is available in the UW-Parkside library.)
The two strands are known as reverse complements. They must match up nucleotide by nucleotide according the rules that
The strands have an identifiable left end, called the five prime, 5', end and a right end, called the three prime, 3', end. DNA sequences are always written from the 5' end to the 3' end. When DNA is used to generate proteins the cell processes it from the 5' end to the 3' end. The ends of the each strand match with the opposite end of the other strand, however. This is shown in the sequences above.5' TGCCCTCCTGCCCTTTTAATGATGCCTAATCG 3' 3' ACGGGAGGACGGGAAAATTACTACGGATTAGC 5'
If we were to write the bottom sequence by itself, it should be written in the opposite order, e.g.
Each gene appears in only one of the strands of the DNA. When scientists search DNA for genes, it is important that they search both strands, but it is a fairly simple algorithm (i.e., doable in CS 1) to translate one strand of DNA into its reverse complement.5' CGATTAGGCATCATTAAAAGGGCAGGAGGGCA 3'
Translating RNA to protein takes some work. A single strand of RNA may encode one or more proteins. There are only 4 nucleotides in the RNA and they need to form the right combinations to represent 20 different amino acids. If we group the nucleotides 2 at a time, we would only get 16 different possible pairs. Therefore we must take the nucleotides 3 at a time to get enough, but this gives us 64 different combinations, well more than we need. A sequence of 3 nucleotides is called a codon. Scientists have thoroughly worked out which amino acid each codon represents. The following table (boldly stolen from the Internet) lists the amino acids, their single letter codes (SLC) and the codons that represent them:
A | Ala | Alanine | GCA GCC GCG GCU |
C | Cys | Cysteine | UGC UGU |
D | Asp | Aspartic Acid | GAC GAU |
E | Glu | Glutamic Acid | GAA GAG |
F | Phe | Phenylalanine | UUC UUU |
G | Gly | Glycine | GGA GGC GGG GGU |
H | His | Histidine | CAC CAU |
I | Ile | Isoleucine | AUA AUC AUU |
K | Lys | Lysine | AAA AAG |
L | Leu | Leucine | UUA UUG CUA CUC CUG CUU |
M | Met | Methionine | AUG |
N | Asn | Asparagine | AAC AAU |
P | Pro | Proline | CCA CCC CCG CCU |
Q | Gln | Glutamine | CAA CAG |
R | Arg | Arginine | AGA AGG CGA CGC CGG CGU |
S | Ser | Serine | AGC AGU UCA UCC UCG UCU |
T | Thr | Threonine | ACA ACC ACG ACU |
V | Val | Valine | GUA GUC GUG GUU |
W | Trp | Tryptophan | UGG |
Y | Tyr | Tyrosine | UAC UAU |
One tricky part of translating RNA to amino acids is knowing where the protein's representation starts. A protein might start at the zeroth, first or second location of an RNA fragment. The gene might also be in the complementary strand of DNA. Thus, given an DNA sequence, there are two possible RNA sequences and 6 possible protein sequences.
If you think this stuff is cool, consider using the CS breadth requirement to take some Bioinformatics courses in UW-Parkside's Department of Biological Sciences.