The Blind Wordmaker: An Evolutionary Game
Sometimes you find something while looking for something else. A few weeks ago I was looking at Wesley Elsberry's version of Dawkins' Weasel and began wondering if it is possible to make a game out of an evolutionary search. I played with the options on the weasel page and tried searching for words instead of full sentences. I discovered that individual words could be found in a reasonable number of generations.
So I began the quest for an evolutionary game. As a game it needed to be fun to play. As a demonstration of cumulative selection it needed to be faithful to the core of Dawkins' algorithm. The variation generator must be blind to the past and future. Every generation must be derived from a parent without regard to goals and without memory of past generations.
I suppose everyone who argues for the theory of evolution gets tired of saying that evolution can't be observed because it takes too long and involves too many generations and too many individuals. This is true in Biology and pretty much true of genetic algorithms. A possible exception is the Biomorph demo, which quickly builds attractive graphical objects. I wondered if something like this couldn't be done with words.
The advantage of playing with words as opposed to using graphics is that many people have a strong interest in words and word games. There are thousands of web sites devoted to Scrabble, both as a competitive game and as a solitaire-like pastime. An image began to form of a game in which humans observed all the children in a Dawkins program, and humans made the selections.
Obviously the population size must be limited to what can reasonably be viewed by a person. And the target must be short enough to be reached in a reasonable number of generations. My first effort was Weasel Words. It plays well enough, but lacks some critical elements needed to sustain interest.
The first necessity is scoring. It is one thing to make words, but people respond to having some tangible token indicating success. So I set out on a quest for a dictionary. And after finding suitable word lists, I encountered the need for some more complex programming. It is not reasonable to include a word list in a web page. It must reside on a server, and it must be accessed by a server based program.
So I rewrote the child generator in asp, built a database and acquired a domain name and host. Things were getting more involved.
I drafted my family as beta testers. Naturally they wanted more features. In addition to a scoring system, they wanted to be able to score words embedded within words. My first response was to worry about database performance on a shared server. That and the necessary programming. Naturally I gave in to my children's demands and devised a system of scoring embedded words. I called them Endogenous RetroWeasels. I needed something to lighten the task of programming.
About this time I got the idea of experimenting with an automated player. Everyone knows about letter frequency from reading Sherlock Holmes' Dancing Men. I thought if I could count the frequency of letters as they occur in word positions, I could score the fitness of word candidates and possibly generate words automatically. Since I already had a database, I could automate the counting and build a table of letter, positions and frequency. This was gene analysis 1.0, and the results fit in a table small enough to be embedded in a program. To keep things honest, fitness scoring occurs after generation of the child population.
This first effort was disappointing. It cleared the population of Xs and Qs, but didn't produce many words. Most of the population turned into Es and Is.
So I got ambitious. If single letters were not sufficient, what about letter pairs? I proceeded to count all the letter pairs found in the English language. and build a database of their frequency in word positions. Of course the frequency depends on the length of the word, so the table must include both word length and position.
First, some terminology . The Blind Wordmaker thinks of the word candidates as genomes having some level of fitness. They are instances of a genetic code. Codes have no meaning in and of themselves. They are interpreted. In the game, the ultimate interpreter is the scoring program, which awards points for children that are in the dictionary. The question addressed by the Blind Wordmaker is: can there be a simple algorithm that evaluates progress toward wordness without having a massive database of 26x26x26x26x26x26x26 possible children?
The answer proposed here is yes, by applying a bit of pseudo-biological thinking. First, Blind Wordmaker thinks of its children as genomes composed of two letter genes, with each gene having 26x26 possible variations or alleles. A child having n letters has n-1 genes (due to the way language users interpret the code). So the genes of "WEASEL" are positions 1+2, 2+3, 3+4, 4+5, 5+6, and 6+7, or "WE", "EA", "AS", "SE", and "EL". The Blind Wordmaker algorithm breaks down each child into its genes and scores each gene by the occurrence frequency of its allele. There are 26x26 possible alleles formed by letter pairs using the English alphabet.
In reality, about 80 percent of possible alleles are used in English. But not with equal frequency. The Blind wordmaker knows how often each allele occurs in each possible position, and uses this information to score genes for fitness. The fitness of a child is the sum of the fitness scores for each gene. The database has only a few thousand entries, far fewer than the number of dictionary words, and enormously fewer than the number of possible children.
So is this smuggling information into the Weasel program? If the question applies to the imperfect replicator, the answer is absolutely not. All the intelligence is in the selectors, the Blind wordmaker and the human player. And isn't this what the theory of evolution has said all along? The Blind wordmaker has a number of interesting behaviors. It can form words, but doesn't necessarily recognize them. It quickly forms pronounceable letter strings, most of which are not words. It often scores the same string several times as best without knowing that it is no longer eligible for game scoring. It can get stuck, scoring non-words as best, because their genes, taken individually, are best. The Blind wordmaker is blind to the ultimate selecting environment.
But does it work? Does it make words?
In Short, yes. The Blind Wordmaker has been extended to include French, German, Welsh and Spanish word lists. It is possible to switch languages at any time and watch the allele frequency of the population change in just a few generations, resulting in populations that fit the language environment of choice. The Blind Wordmaker not only makes words, but also makes an effective demonstration of the power of selection to alter allele frequencies. It might even be fun to play, as a game.
The Blind Wordmaker may be found at:
The Elsberry Weasel may be found here:
Biomorphs are found at:
The Wikipedia article on Dawkins' Weasel May be found at: