Free Web Hosting by Netfirms
Web Hosting by Netfirms | Free Domain Names by Netfirms

 

GeneticNet.tk logo, Genetic.gq.nu logo

Bioinformatics

(Genomics)

 

 

 

 

 

 

To build a house you need bricks and mortar and something else -- the "know-how", or "information" as to how to go about your business. The Victorians knew this. But when it came to the "building" of animals and plants, -- the word, and perhaps the concept, of "information" is difficult to trace in their writings. 

    Classical scholars tell us that Aristotle did not have this problem. The "eidos", the form-giving essense that shapes the embryo "contributes nothing to the the material body of the embryo but only communicates its program of development" (see Delbrück's "Aristotle-totle-totle" in Of Microbes and Life 1971, pp. 50-55).

   William Bateson spoke of a "factor" (gene) having the "power" to bring about the building of the characters which make up an organism . He used the "information" concept, but not the word. He was prepared to believe his factors were molecules of the type we would today call macromolecules, but he did not actually called them "informational macromolecules". I do not know when this gap was bridged (see E. Schrodinger's What is Life? 1944; L. E. Kay 1995. Science in Context 8, 609-634). But certainly by the time of the discovery of the double helical structure of DNA in 1953 the concept was established (see Olby's The Path to the Double Helix, 1974, and Portugal and Cohens' A Century of DNA, 1977). 

 

    Information has many forms. If you turn down the corner of a page of a book to remind you where you stopped reading ("book-mark"), then you have left information on the page. In future you read ("decode") the bookmark with the knowledge that it means "continue here". A future historian might be interested in where you paused in your reading. Coming across the book, he/she would notice creases suggesting that a flap had been turned down. Making assumptions about the code you were employing, a feasible map of the book could then be made with your pause sites. It might be discovered that you paused at particular sites, say at the ends of chapters. In this case pauses would be correlated with the distribution of the book's "primary information". Or perhaps there was a random element to your pausing ... perhaps when your partner wanted the light out. In this case pausing would be influenced by your pairing relationship.

  A more familiar form of information is the linear form you are now decoding (reading), which is similar to the form you might decode on the page of a book. If a turned-down flap on a page is a large one, it might cover up some of the information. Thus, one form of information might interfere with another form of information. To read the text you would have to correct (fold back) the "secondary structure" of the page (the flap) so that it no longer overlapped the text. Thus, there is a conflict. You can either retain the flap and not read the text, or get rid of the flap and read the text.

  In the case of a book page, the text is imposed on an underlying flat two dimensional base, the paper. The text (message) and the medium are different. Similarly, in the case of our genetic material, DNA,  the "medium" is a chain of two units (phosphate and ribose)  and the most easily recognized "message" is provided by a sequence of "letters" (bases) attached, like beads, to the chain. As in the case of a written text on paper, "flaps" in DNA (secondary structure) can conflict with the base sequence (primary structure). Thus the pressures to convey information (messages) encoded in a particular sequence, and to convey information encoded in a "flap", may be in conflict. The "hand" of evolution has to resolve these apparently intrinsic conflicts while dealing with other pressures (extrinsic) from the environment. 

  The stunning novelty of the Watson-Crick model of DNA was not only that it was beautiful, but that it also explained so much of the biology of heredity. There was not just one sequence of letters, but two. These were wrapped round each other in the form of a double helix. One was the complement of the other, so that the sequence of one string (strand) could be inferred from the sequence of the other. If there were damage to one strand of DNA, then that strand could potentially be repaired on the basis of the text of the opposite strand. When the cell divided the two strands would part and separate. New "daughter" strands, synthesized from nucleotide "building blocks" (each consisting of phosphate, ribose and a base) , would replace those which had separated, so that duplexes identical to the parental duplex would be created.

  There were two main types of bases, purines (R) and pyrimidines (Y). Thus, disregarding the phosphate-ribose chain, the first nucleic acids to appear in evolution could accurately be represented as a binary sequence such as RYRRYRYYRYR.... Each base would be a "binary digit". Conventionally, we represent binary digits in computer language as strings of 0s and 1s. If a Y and an R were equally likely alternatives in a sequence position, then each could be quantitated as one "bit" of information.

  Each base came to acquire two flavours. There are two main types of purines, adenine (A) and guanine (G), and two main types of pyrimidines, cytosine (C) and thymine (T). Thus, the above sequence might now be represented as (say) ACGATGCCGTA.... Chargaff's first parity rule is that purines pair with pyrimidines, specifically A with T and C with G. Thus, a duplex containing this sequence, with pairing between complementary bases in the "top" and "bottom" strands  could be written as:

 
ACGATGCCGTAGCATCGT
TGCTACGGCATCGTAGCA

 

 

  It was later realized that, under certain circumstances, this double helix could form "flaps". Thus each of the above two strands ("top" and "bottom") can form stem-loop secondary structures, of the following type, due to pairing with complementary bases in the same strand.

 
                C
ACGATGC     G
TGCTACG     T
                A
 

 

 

 

For this to happen there have to be matching (complementary) bases. Only the bases in the loop (CGTA) are unpaired in this structure. The stem consists of paired bases. Thus Chargaff's parity rule has to apply, to a close approximation, to single strands of DNA. When one examines DNAs from whatever biological source, one invariably finds that the rule applies. We refer to this as Chargaff's second parity rule.

  Returning to our own written textual form of information, the sentence "Mary had a little lamb its fleece was white as snow" contains the information that a person called Mary is in possession of an immature sheep. The same information might be written in Chinese or Greek. Thus, the sentence contains not only its primary information, but secondary information about its origin -- e.g. it is likely that the author is more familiar with English than other languages. Some believe that English is on the way to displacing other languages, so that eventually it (or the form it evolves to) will constitute the only language used by human beings on this planet. Similarly, in the course of early evolution it is likely that a prototypic nucleic acid language displaced contenders.

  It would be difficult to discern a relationship between the English, Chinese and Greek versions of the above sentence, because these languages diverged from primitive root languages thousands of years ago. However, in England, if a person with a Cockney accent were to speak the sentence it would sound like "Miree ader liawl laimb sfloyce wors woyt ers snaa". Cockney English and "regular" English diverged more recently and it is easy to discern similarities.

    Now look at the following text:

 
yewas htbts llem ws arifea ac wMhitte alidsnoe la

irsnwwis aee ar lal larfoMyce b sos woilmyt erdea

 

 

One line of text is the regular English version with the letters shuffled. The other line is the cockney version with the letters shuffled. Can you tell which is which? If the shuffling was thorough, the primary information has been destroyed. However, there is still some information left. With the knowledge that cockneys tend to "drop" their Hs, it can be deduced that the upper text is more likely to be from someone who spoke regular English. With a longer text, this could be more precisely quantitated. Languages have characteristic letter frequencies. You can take a segment ("window") and count the various letters in that segment.

  In this way you can identify a text as English, Cockney, Chinese or Greek, without too much trouble. We can call this information "secondary information". There may be various other levels of information in a sequence of symbols. To evaluate the secondary information in DNA (with only four "letters"), you select a "window" (say 1000 bases) and counts the number of bases in that window. You can apply the same window to another section of the DNA, or to another DNA molecule from a different biological species, and repeat the count. Then you can compare DNA "accents". 

  The best understood type of primary information in DNA is the information for proteins. The DNA sequence of bases (one type of "letter") encodes another type of "letter", the "amino acids". There are 20 amino acids, with names such as aspartate, glycine, phenylalanine, serine and valine (which are abbreviated as Asp, Gly, Phe, Ser and Val). Under instructions received from DNA, amino acids are joined together in the same order as they are encoded in DNA, to form proteins. The latter, chains of amino acids which fold in complicated ways, play a major role in determining how we interact with our environment. The proteins determine our "phenotype". For example, in an organism of a particular species ("A") the twenty one base DNA sequence:

 
TTTTCATTAGTTGGAGATAAA

 

 

read in sets of three bases ("codons"), conveys primary information for a seven amino acid protein fragment (PheSerLeuValGlyAspLys). All members of the species will tend to have the same DNA sequence, and differences between members of the species will tend to be rare and of minor degree. If the protein is fundamental to cell function it is likely that organisms of another species ("B") will have DNA which encodes the same protein fragment. However, when we examine their DNA we might find major differences compared with the DNA of the first species (the similarities are emphasized in red):

 
TTCAGCCTCGTGGGGGACAAG
 

 

This sequence also encodes the above protein fragment, showing that the DNA contains the same primary information as in the first DNA sequence, but it is "spoken" with a different "accent". This secondary information might have some biological role. It is theoretical possible (but unlikely) that all the genes in an organism of species B would have this "accent", yet otherwise encode the same proteins. In this case, organisms of species A and B would be both anatomically and functionally (physiologically) identical, while differing dramatically with respect to secondary information.

   On the other hand, consider a single change in the sequence of species A to:

 
TTTTCATTAGTTGGAGTTAAA

 

 

Here the difference (emphasized in red) would change one of the seven amino acids. It is likely that such minor changes in a very small number of genes affecting development would be sufficient to cause anatomical and morphological differentiation within species A (e.g. compare a bulldog and a poodle, as "varieties" of dogs, which are able to breed with each other). Yet, in this case the secondary information would be hardly changed.

   The view developed in these pages is that, like the Cockney's dropped H's, the role of secondary information is to initiate, and, for a while, maintain, reproductive isolation. This can occur because the genetic code is a "redundant" or "degenerate" code; for example, the amino acid serine is not encoded by just one codon; there are six possible codons (TCT, TCC, TCA, TCG, AGT, AGC). In the first of the above DNA sequences (A) the amino acid serine (Ser) is encoded by TCA, whereas AGC is used in the second (B). On the other hand, the change in species A from GAT (first sequence) to GTT (third sequence) changes the encoded amino acid from aspartic acid (Asp) to valine (Val), and this should be sufficient to change the properties of the corresponding protein, and hence change the phenotype.

  Thus, the biological interest of linguistic barriers is that they also tend to be reproductive barriers. Even if a Chinese person and an English person are living in the same territory ("sympatrically"), if they do not speak the same language they are unlikely to marry. The Chinese tend to marry Chinese and produce more Chinese. The English tend to marry English and produce more English. Even in England, because of the "class" barriers so colourfully portrayed by George Bernard Shaw, Cockneys tend to marry Cockneys, and the essence of the barrier from people speaking "regular" English is the difference in accent. Because of other ("blending") factors at work in our society it is unlikely that this linguistic speciation will continue to the extent that Cockney will become an independent language. However, the point is that when there is "incipient" linguistic speciation, it is the secondary information (dropped H's) , not the primary information, which constitutes the barrier.

  Before the genetic code was deciphered in the early 1960s, researchers such as Wyatt (1952) and Sueoka (1961) studied the base composition of DNAs with a major interest in the primary information -- how a sequence of bases might be related to a sequence of amino acids. However, their results have turned out to be of greater interest with respect to the secondary information in DNA.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

All Rights reserved http://www.genetics.i8.com©2002,  http://www.genetics.gq.nu©2003,  http://www.geneticnet.tk©2004, http://genetics.netfirms.com© 2005

Designed by Plato-Design

hosted by PlatoHost