Joined: June 2008
A half year's worth of birfday threads have pushed this worthy thread down and out of sight.
Rather than just bump it, I will take this chance to share with you a design for a GA experiment. Comments please.
I've been thinking recently about the arguments of Polanyi, as filtered by Meyer, and trickling down to the level of Upright Biped and David Abel. Can we show the evolution of a genetic code from nothing (pure noise) to some level of function? I think yes, we can.
Each member of my GA population will have a different genetic code to start. There are 64 codons, and 22 amino acids, so the framework of each genome is a 64*22 array. The content of each array slot will be 10 bits that can be read as an affinity of this codon for this amino acid. (So the total genome size is 64*22*10 bits.)
Let's say we are looking at the row for AUU. It contains 22 10-bit integers. We can look at these as weights that affect the likelihood this codon will code for a particular amino acid. As an illustration, perhaps the value for AA3 (amino acid 3) is 200 and AA12 is 400, and the rest of the row is 0. In this case, the codon AUU will produce amino acid 3 about 1/3 of the time, and amino acid 12 about 2/3 of the time.
If we look at the modern genetic code as this kind of array, each of the 64 rows is full of 0 bits in 21 out of 22 slots. In that 22nd slot, the bits are all 1s. Actually, the code is not quite that strong, and sometimes a codon will produce a different amino acid (leucine or valine instead of isoleucine, for example). So from a randomly filled array, we want to see if an array will come to dominate the population that resembles a mordern array. Importantly, we don't care which codon eventually comes to code for which amino acid.
The fitness function will test 640 values created by taking each codon 10 times, and choosing an amino acid based on the weights in the row in the table. This results in 640 codon-AA pairs. Now we score the fitness of the individual based on these pairs.
The first criteria is coverage, does the table produce all 22 amino acids? Score one for each unique AA in the 640 outputs. Maximum score on this criteria = 22.
The second criteria is reliability, does the table produce the same AA each time? Calculate the score by first looking at each group of 10 trials for a single codon. For the AA produced most often, how many trials produced that AA? The answer will be 10% to 100%. Average these percentages across all 64 codons. Maximum score is 100% reliability.
Third criteria is efficiency, how often does each AA get produced? Do six codons produce one AA, while another AA is produced by only one codon? We could create an order to the list of AAs and say the top AA is needed three times more often than the lowest. Or we could simply measure if they are all produced at about the same rate.
The fourth criteria is resilience, if there is a mutation in a codon, will the same AA still be produced?
Fifth criterion is like a second order resilience, if a mutation creates a different AA, is that AA still in the same polar/nonpolar, hydrophobic/hydrophillic class?
Take a weighted average of each criteria. I would weight the criteria so that the first one is most important.
Since the genome is about 40Kbits, I would use a large population, at least 1,000 individuals, and be prepared for some long run times before seeing convergence on a solution. However, I see no reason to expect that a single code will not eventually emerge the winner. The open question is the structure of that winner.
Yes, this a complex design, maybe overdesigned, but I think it will work. It will aslo support changing the weights of the different criteria and seeing the winning table has any significantly different structure as a result.
Iím referring to evolution, not changes in allele frequencies. - Cornelius Hunter
Iím not an evolutionist, Iím a change in allele frequentist! - Nakashima