Printable Version of Topic

-Antievolution.org Discussion Board
+--Forum: After the Bar Closes...
+---Topic: Evolutionary Computation started by Wesley R. Elsberry


Posted by: Wesley R. Elsberry on Mar. 17 2009,11:00

This thread is for comments on evolutionary computation.
Posted by: Wesley R. Elsberry on Mar. 17 2009,11:15

"weasel" math

Given:

Base set size K (number of possible characters at each position)

Target string length L

Mutation rate (per site) u

Population size N

here are some basic probabilities to go with a "weasel" run.

Per base:

Pincorrect to correct = 1 / K

Pcorrect to incorrect = (K - 1) / K

Blind search:

Ptry is all correct = K-L

Pa try in the population is all correct = N * K-L

Expected number of correct bases when all bases are changed = L  / K

Expected number of correct bases when a genome is produced via copy with mutation = u * L  / K

In "weasel" run:

Expected number of correct bases given a partially matching string:
Given C as number of correct matching bases

expected correct bases after mutation = C + (u * (L - C) / K) - (u * C * (K - 1) / K)

There's a few more items to derive to pull in the population parameter, but I need to go now.

Edit: Equations for Ptry is all correct and dependencies per PT comment by Mike Elzinga.


Posted by: Wesley R. Elsberry on Mar. 18 2009,11:03

More "weasel" math

Probability that a candidate will retain all the correct letters from its parent: (1 - (u * (k - 1) / k))C
Posted by: Richardthughes on Mar. 18 2009,11:24

Can you derive an optimal mutation rate?
Posted by: Wesley R. Elsberry on Mar. 18 2009,11:47

Quote (Richardthughes @ Mar. 18 2009,11:24)
Can you derive an optimal mutation rate?
---------------------QUOTE-------------------


I'll have to think about that some. Later.
Posted by: Wesley R. Elsberry on Mar. 18 2009,13:19

More "weasel" math:

Probability of a candidate changing a parent's correct base to an incorrect base = PCandidate_C2I =

(1 - (1 - (u * (K - 1) / K))C)

Probability that a population will have at least one candidate that preserves all the correct bases from the parent of the previous generation = PPopulation_C2C =

1 - (PCandidate_C2I )N

Checked via Monte Carlo methods, and using the N=50 and u=0.05 values that (IIRC) ROb was often using:



---------------------CODE SAMPLE-------------------

1000 runs, 00 correct : p_c2c calc = 1.00000, MC = 1.00000; p_c2i calc = 0.00000, MC = 0.00000

1000 runs, N=50, u=0.05000, K=27, C=0, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.00000

1000 runs, 01 correct : p_c2c calc = 0.95185, MC = 0.93800; p_c2i calc = 0.04815, MC = 0.06200

1000 runs, N=50, u=0.05000, K=27, C=1, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.04726

1000 runs, 02 correct : p_c2c calc = 0.90602, MC = 0.90400; p_c2i calc = 0.09398, MC = 0.09600

1000 runs, N=50, u=0.05000, K=27, C=2, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.09476

1000 runs, 03 correct : p_c2c calc = 0.86240, MC = 0.85900; p_c2i calc = 0.13760, MC = 0.14100

1000 runs, N=50, u=0.05000, K=27, C=3, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.13592

1000 runs, 04 correct : p_c2c calc = 0.82088, MC = 0.82400; p_c2i calc = 0.17912, MC = 0.17600

1000 runs, N=50, u=0.05000, K=27, C=4, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.17810

1000 runs, 05 correct : p_c2c calc = 0.78135, MC = 0.80400; p_c2i calc = 0.21865, MC = 0.19600

1000 runs, N=50, u=0.05000, K=27, C=5, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.21618

1000 runs, 06 correct : p_c2c calc = 0.74373, MC = 0.75900; p_c2i calc = 0.25627, MC = 0.24100

1000 runs, N=50, u=0.05000, K=27, C=6, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.25848

1000 runs, 07 correct : p_c2c calc = 0.70792, MC = 0.74400; p_c2i calc = 0.29208, MC = 0.25600

1000 runs, N=50, u=0.05000, K=27, C=7, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.29226

1000 runs, 08 correct : p_c2c calc = 0.67384, MC = 0.67200; p_c2i calc = 0.32616, MC = 0.32800

1000 runs, N=50, u=0.05000, K=27, C=8, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.32460

1000 runs, 09 correct : p_c2c calc = 0.64139, MC = 0.62100; p_c2i calc = 0.35861, MC = 0.37900

1000 runs, N=50, u=0.05000, K=27, C=9, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.36086

1000 runs, 10 correct : p_c2c calc = 0.61051, MC = 0.61000; p_c2i calc = 0.38949, MC = 0.39000

1000 runs, N=50, u=0.05000, K=27, C=10, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.38966

1000 runs, 11 correct : p_c2c calc = 0.58112, MC = 0.59500; p_c2i calc = 0.41888, MC = 0.40500

1000 runs, N=50, u=0.05000, K=27, C=11, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.42060

1000 runs, 12 correct : p_c2c calc = 0.55314, MC = 0.54600; p_c2i calc = 0.44686, MC = 0.45400

1000 runs, N=50, u=0.05000, K=27, C=12, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.44794

1000 runs, 13 correct : p_c2c calc = 0.52650, MC = 0.52000; p_c2i calc = 0.47350, MC = 0.48000

1000 runs, N=50, u=0.05000, K=27, C=13, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.47050

1000 runs, 14 correct : p_c2c calc = 0.50115, MC = 0.50900; p_c2i calc = 0.49885, MC = 0.49100

1000 runs, N=50, u=0.05000, K=27, C=14, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.50130

1000 runs, 15 correct : p_c2c calc = 0.47702, MC = 0.45800; p_c2i calc = 0.52298, MC = 0.54200

1000 runs, N=50, u=0.05000, K=27, C=15, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.51658

1000 runs, 16 correct : p_c2c calc = 0.45406, MC = 0.48200; p_c2i calc = 0.54594, MC = 0.51800

1000 runs, N=50, u=0.05000, K=27, C=16, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.54270

1000 runs, 17 correct : p_c2c calc = 0.43219, MC = 0.41800; p_c2i calc = 0.56781, MC = 0.58200

1000 runs, N=50, u=0.05000, K=27, C=17, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.56708

1000 runs, 18 correct : p_c2c calc = 0.41139, MC = 0.41200; p_c2i calc = 0.58861, MC = 0.58800

1000 runs, N=50, u=0.05000, K=27, C=18, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.59218

1000 runs, 19 correct : p_c2c calc = 0.39158, MC = 0.35000; p_c2i calc = 0.60842, MC = 0.65000

1000 runs, N=50, u=0.05000, K=27, C=19, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.61070

1000 runs, 20 correct : p_c2c calc = 0.37272, MC = 0.37200; p_c2i calc = 0.62728, MC = 0.62800

1000 runs, N=50, u=0.05000, K=27, C=20, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.62762

1000 runs, 21 correct : p_c2c calc = 0.35478, MC = 0.33300; p_c2i calc = 0.64522, MC = 0.66700

1000 runs, N=50, u=0.05000, K=27, C=21, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.64048

1000 runs, 22 correct : p_c2c calc = 0.33770, MC = 0.32200; p_c2i calc = 0.66230, MC = 0.67800

1000 runs, N=50, u=0.05000, K=27, C=22, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.66146

1000 runs, 23 correct : p_c2c calc = 0.32144, MC = 0.31500; p_c2i calc = 0.67856, MC = 0.68500

1000 runs, N=50, u=0.05000, K=27, C=23, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.67854

1000 runs, 24 correct : p_c2c calc = 0.30596, MC = 0.28900; p_c2i calc = 0.69404, MC = 0.71100

1000 runs, N=50, u=0.05000, K=27, C=24, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.69380

1000 runs, 25 correct : p_c2c calc = 0.29123, MC = 0.28000; p_c2i calc = 0.70877, MC = 0.72000

1000 runs, N=50, u=0.05000, K=27, C=25, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.70792

1000 runs, 26 correct : p_c2c calc = 0.27721, MC = 0.27700; p_c2i calc = 0.72279, MC = 0.72300

1000 runs, N=50, u=0.05000, K=27, C=26, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.72154

1000 runs, 27 correct : p_c2c calc = 0.26386, MC = 0.23500; p_c2i calc = 0.73614, MC = 0.76500

1000 runs, N=50, u=0.05000, K=27, C=27, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.73644

1000 runs, 28 correct : p_c2c calc = 0.25116, MC = 0.24500; p_c2i calc = 0.74884, MC = 0.75500

1000 runs, N=50, u=0.05000, K=27, C=28, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.74932

---------------------CODE SAMPLE-------------------



The above completely explains why a list showing the best candidate from each generation is highly unlikely to show any change in a correct character when a candidate bearing it was selected as best in a previous generation. The proportion of candidates that had a change of a correct character to an incorrect one nonetheless rises to almost three-quarters of each generation when almost all characters are correct.

Now doing the Monte Carlo methods on the situation with N=12 and u=0.18, where I picked N and u in order to get a range of values for the population that went down to a relatively small probability.



---------------------CODE SAMPLE-------------------

1000 runs, 00 correct : p_c2c calc = 1.00000, MC = 1.00000; p_c2i calc = 0.00000, MC = 0.00000

1000 runs, N=12, u=0.18000, K=27, C=0, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.00000

1000 runs, 01 correct : p_c2c calc = 0.82667, MC = 0.82700; p_c2i calc = 0.17333, MC = 0.17300

1000 runs, N=12, u=0.18000, K=27, C=1, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.17058

1000 runs, 02 correct : p_c2c calc = 0.68338, MC = 0.69300; p_c2i calc = 0.31662, MC = 0.30700

1000 runs, N=12, u=0.18000, K=27, C=2, p_pop_c2c calc = 1.00000, MC = 1.00000
Proportion of candidates w/C2I bases = 0.32233

1000 runs, 03 correct : p_c2c calc = 0.56493, MC = 0.56600; p_c2i calc = 0.43507, MC = 0.43400

1000 runs, N=12, u=0.18000, K=27, C=3, p_pop_c2c calc = 0.99995, MC = 1.00000
Proportion of candidates w/C2I bases = 0.44117

1000 runs, 04 correct : p_c2c calc = 0.46701, MC = 0.48600; p_c2i calc = 0.53299, MC = 0.51400

1000 runs, N=12, u=0.18000, K=27, C=4, p_pop_c2c calc = 0.99947, MC = 1.00000
Proportion of candidates w/C2I bases = 0.53508

1000 runs, 05 correct : p_c2c calc = 0.38606, MC = 0.39800; p_c2i calc = 0.61394, MC = 0.60200

1000 runs, N=12, u=0.18000, K=27, C=5, p_pop_c2c calc = 0.99713, MC = 0.99700
Proportion of candidates w/C2I bases = 0.61167

1000 runs, 06 correct : p_c2c calc = 0.31914, MC = 0.32100; p_c2i calc = 0.68086, MC = 0.67900

1000 runs, N=12, u=0.18000, K=27, C=6, p_pop_c2c calc = 0.99008, MC = 0.99200
Proportion of candidates w/C2I bases = 0.67967

1000 runs, 07 correct : p_c2c calc = 0.26382, MC = 0.25100; p_c2i calc = 0.73618, MC = 0.74900

1000 runs, N=12, u=0.18000, K=27, C=7, p_pop_c2c calc = 0.97466, MC = 0.97200
Proportion of candidates w/C2I bases = 0.73475

1000 runs, 08 correct : p_c2c calc = 0.21809, MC = 0.23600; p_c2i calc = 0.78191, MC = 0.76400

1000 runs, N=12, u=0.18000, K=27, C=8, p_pop_c2c calc = 0.94778, MC = 0.95500
Proportion of candidates w/C2I bases = 0.78383

1000 runs, 09 correct : p_c2c calc = 0.18029, MC = 0.19100; p_c2i calc = 0.81971, MC = 0.80900

1000 runs, N=12, u=0.18000, K=27, C=9, p_pop_c2c calc = 0.90797, MC = 0.91400
Proportion of candidates w/C2I bases = 0.81492

1000 runs, 10 correct : p_c2c calc = 0.14904, MC = 0.16000; p_c2i calc = 0.85096, MC = 0.84000

1000 runs, N=12, u=0.18000, K=27, C=10, p_pop_c2c calc = 0.85582, MC = 0.85600
Proportion of candidates w/C2I bases = 0.85667

1000 runs, 11 correct : p_c2c calc = 0.12321, MC = 0.12900; p_c2i calc = 0.87679, MC = 0.87100

1000 runs, N=12, u=0.18000, K=27, C=11, p_pop_c2c calc = 0.79357, MC = 0.78600
Proportion of candidates w/C2I bases = 0.88083

1000 runs, 12 correct : p_c2c calc = 0.10185, MC = 0.09800; p_c2i calc = 0.89815, MC = 0.90200

1000 runs, N=12, u=0.18000, K=27, C=12, p_pop_c2c calc = 0.72446, MC = 0.72100
Proportion of candidates w/C2I bases = 0.89900

1000 runs, 13 correct : p_c2c calc = 0.08420, MC = 0.08100; p_c2i calc = 0.91580, MC = 0.91900

1000 runs, N=12, u=0.18000, K=27, C=13, p_pop_c2c calc = 0.65196, MC = 0.66100
Proportion of candidates w/C2I bases = 0.91458

1000 runs, 14 correct : p_c2c calc = 0.06960, MC = 0.07600; p_c2i calc = 0.93040, MC = 0.92400

1000 runs, N=12, u=0.18000, K=27, C=14, p_pop_c2c calc = 0.57925, MC = 0.55400
Proportion of candidates w/C2I bases = 0.93192

1000 runs, 15 correct : p_c2c calc = 0.05754, MC = 0.06100; p_c2i calc = 0.94246, MC = 0.93900

1000 runs, N=12, u=0.18000, K=27, C=15, p_pop_c2c calc = 0.50891, MC = 0.50000
Proportion of candidates w/C2I bases = 0.94333

1000 runs, 16 correct : p_c2c calc = 0.04756, MC = 0.03900; p_c2i calc = 0.95244, MC = 0.96100

1000 runs, N=12, u=0.18000, K=27, C=16, p_pop_c2c calc = 0.44278, MC = 0.45400
Proportion of candidates w/C2I bases = 0.95083

1000 runs, 17 correct : p_c2c calc = 0.03932, MC = 0.03800; p_c2i calc = 0.96068, MC = 0.96200

1000 runs, N=12, u=0.18000, K=27, C=17, p_pop_c2c calc = 0.38206, MC = 0.38200
Proportion of candidates w/C2I bases = 0.96183

1000 runs, 18 correct : p_c2c calc = 0.03250, MC = 0.03600; p_c2i calc = 0.96750, MC = 0.96400

1000 runs, N=12, u=0.18000, K=27, C=18, p_pop_c2c calc = 0.32736, MC = 0.32200
Proportion of candidates w/C2I bases = 0.96875

1000 runs, 19 correct : p_c2c calc = 0.02687, MC = 0.03200; p_c2i calc = 0.97313, MC = 0.96800

1000 runs, N=12, u=0.18000, K=27, C=19, p_pop_c2c calc = 0.27881, MC = 0.25300
Proportion of candidates w/C2I bases = 0.97600

1000 runs, 20 correct : p_c2c calc = 0.02221, MC = 0.02000; p_c2i calc = 0.97779, MC = 0.98000

1000 runs, N=12, u=0.18000, K=27, C=20, p_pop_c2c calc = 0.23629, MC = 0.23400
Proportion of candidates w/C2I bases = 0.97817

1000 runs, 21 correct : p_c2c calc = 0.01836, MC = 0.01800; p_c2i calc = 0.98164, MC = 0.98200

1000 runs, N=12, u=0.18000, K=27, C=21, p_pop_c2c calc = 0.19941, MC = 0.19100
Proportion of candidates w/C2I bases = 0.98308

1000 runs, 22 correct : p_c2c calc = 0.01518, MC = 0.01600; p_c2i calc = 0.98482, MC = 0.98400

1000 runs, N=12, u=0.18000, K=27, C=22, p_pop_c2c calc = 0.16769, MC = 0.17300
Proportion of candidates w/C2I bases = 0.98458

1000 runs, 23 correct : p_c2c calc = 0.01255, MC = 0.00600; p_c2i calc = 0.98745, MC = 0.99400

1000 runs, N=12, u=0.18000, K=27, C=23, p_pop_c2c calc = 0.14061, MC = 0.15400
Proportion of candidates w/C2I bases = 0.98617

1000 runs, 24 correct : p_c2c calc = 0.01037, MC = 0.01400; p_c2i calc = 0.98963, MC = 0.98600

1000 runs, N=12, u=0.18000, K=27, C=24, p_pop_c2c calc = 0.11762, MC = 0.11200
Proportion of candidates w/C2I bases = 0.99042

1000 runs, 25 correct : p_c2c calc = 0.00858, MC = 0.00600; p_c2i calc = 0.99142, MC = 0.99400

1000 runs, N=12, u=0.18000, K=27, C=25, p_pop_c2c calc = 0.09819, MC = 0.10400
Proportion of candidates w/C2I bases = 0.99075

1000 runs, 26 correct : p_c2c calc = 0.00709, MC = 0.01200; p_c2i calc = 0.99291, MC = 0.98800

1000 runs, N=12, u=0.18000, K=27, C=26, p_pop_c2c calc = 0.08183, MC = 0.08000
Proportion of candidates w/C2I bases = 0.99283

1000 runs, 27 correct : p_c2c calc = 0.00586, MC = 0.00700; p_c2i calc = 0.99414, MC = 0.99300

1000 runs, N=12, u=0.18000, K=27, C=27, p_pop_c2c calc = 0.06810, MC = 0.07400
Proportion of candidates w/C2I bases = 0.99358

1000 runs, 28 correct : p_c2c calc = 0.00484, MC = 0.00600; p_c2i calc = 0.99516, MC = 0.99400

1000 runs, N=12, u=0.18000, K=27, C=28, p_pop_c2c calc = 0.05661, MC = 0.04400
Proportion of candidates w/C2I bases = 0.99617

---------------------CODE SAMPLE-------------------



The above shows that in order to have low probabilities that the best candidate in a generation will retain all the characters that were correct in the parent, one must have small N and relatively high u values.
Posted by: Wesley R. Elsberry on Mar. 18 2009,14:17

Here's something for David...



Population size is on the X axis, running from 1 to 500. Mutation rate is on the Y axis, running from 0.0 (bottom of image) to 1.0. The lighter the pixel, the better the chance of convergence. This was generated by finding the PPopulation_C2C(K-1) for each condition represented by the pixel and scaling that probability over 1,024 grayscale values.

As expected, there is no local sensitivity to change in parameters.

Expanding the population scale by ten gives this:



Posted by: dvunkannon on Mar. 18 2009,14:57

Some Id-ists have trouble understanding why abstractions like GA/EC are relevant - ie. but it ain't wet! An important point for these folks (and others) is that GA isn't a model of evolution, it _IS_ evolution.
Posted by: Wesley R. Elsberry on Mar. 18 2009,15:45

Huh... I just realized that I should have done the graphs up for (L-1) instead of (K-1). It's the difference between 26 and 27, so it won't make a big shift, but I'll generate those later when I get a chance.
Posted by: AmandaHuginKiss on Mar. 18 2009,16:33

Quote (dvunkannon @ Mar. 19 2009,07:57)
Some Id-ists have trouble understanding why abstractions like GA/EC are relevant - ie. but it ain't wet! An important point for these folks (and others) is that GA isn't a model of evolution, it _IS_ evolution.
---------------------QUOTE-------------------


That's one thing that I would like to try if I had time is to model the wet evolution.
Posted by: dvunkannon on Mar. 18 2009,17:01

Quote (Wesley R. Elsberry @ Mar. 18 2009,15:17)
Here's something for David...



Population size is on the X axis, running from 1 to 500. Mutation rate is on the Y axis, running from 0.0 (bottom of image) to 1.0. The lighter the pixel, the better the chance of convergence. This was generated by finding the PPopulation_C2C(K-1) for each condition represented by the pixel and scaling that probability over 1,024 grayscale values.

As expected, there is no local sensitivity to change in parameters.

Expanding the population scale by ten gives this:



---------------------QUOTE-------------------


Thank you Wes!

Sometimes people are stunned by complexity, but these images are so simple that most people don't see the significance. Evolution just works.
Posted by: Wesley R. Elsberry on Mar. 19 2009,09:33

"weasel" graph of PPopulation_C2C(L-1):



ETA: Again, population from 1 to 500 is on the X axis, and mutation probability from 0 to 1.0 is on the Y axis.

Comparison of (K-1) v. (L-1) versions of the graph (lighter is less different):




Posted by: Freelurker on Mar. 19 2009,11:35

Quote (dvunkannon @ Mar. 18 2009,15:57)
Some Id-ists have trouble understanding why abstractions like GA/EC are relevant - ie. but it ain't wet! An important point for these folks (and others) is that GA isn't a model of evolution, it _IS_ evolution.
---------------------QUOTE-------------------


This is true in one sense, but let's not lose the distinction between genetic optimization algorithms and simulations of biological evolution.

It seems to me that Dembski makes mischief in just this way. All this criticism of modelers "sneaking in" information just isn't relevant to simulation models. The entire model, every bit of it, came from the modeler. The real issue is the fidelity of the model; does it match reality sufficiently to justify any conclusions one makes based on the model.
Posted by: dvunkannon on Mar. 19 2009,11:48

Quote (Freelurker @ Mar. 19 2009,12:35)
Quote (dvunkannon @ Mar. 18 2009,15:57)
Some Id-ists have trouble understanding why abstractions like GA/EC are relevant - ie. but it ain't wet! An important point for these folks (and others) is that GA isn't a model of evolution, it _IS_ evolution.
---------------------QUOTE-------------------


This is true in one sense, but let's not lose the distinction between genetic optimization algorithms and simulations of biological evolution.

It seems to me that Dembski makes mischief in just this way. All this criticism of modelers "sneaking in" information just isn't relevant to simulation models. The entire model, every bit of it, came from the modeler. The real issue is the fidelity of the model; does it match reality sufficiently to justify any conclusions one makes based on the model.
---------------------QUOTE-------------------


I agree. There are folks who deny evolution can exist at all, and there are those who deny what biology does is evolution.
Posted by: Richardthughes on Mar. 19 2009,16:05

EIL's math page for 'Weasel':

< http://www.evoinfo.org/WeaselMath.html >
Posted by: Wesley R. Elsberry on Mar. 19 2009,18:39

Did you notice that there wasn't any math there for the "weasel" as described by Dawkins? Just Dembski/Marks "partitioned search" and "deterministic search".
Posted by: Richardthughes on Mar. 19 2009,18:50

Quote (Wesley R. Elsberry @ Mar. 19 2009,18:39)
Did you notice that there wasn't any math there for the "weasel" as described by Dawkins? Just Dembski/Marks "partitioned search" and "deterministic search".
---------------------QUOTE-------------------


Yes. They're very keen to frame it as it isn't. I'm convinced Dembski still doesn't 'get' GAs.
Posted by: Wesley R. Elsberry on Mar. 20 2009,00:30

Here's an interesting graph:



I've put a 20 pixel border around this. On the X axis, there is the number of correct letters (treated as a continuous scale), and mutation rate is on the Y axis. I've taken terms from the "expected number of correct letters in a mutated string" calculation and subtracted the term for expected conversion of correct to incorrect from the expected conversion of incorrect to correct. Black is a net 28 expected new incorrect letters, white is a net 2 expected new correct letters, and the border color is where the two terms cancel each other out. One can see at a glance that as one considers candidates with more matching letters, only lower mutation rates are going to give a good chance of matching all the letters.

And here's the same graph, but with the net 1 expected new incorrect values shifted to black, too, making a contour visible, and showing how the mutation rate interacts with expectations for new candidate strings:


Posted by: Wesley R. Elsberry on Mar. 25 2009,15:25

"weasel" versus "partitioned search"

I derived an equation for expectation of correct bases following mutation in "weasel" (see above for earlier reference):

 

---------------------QUOTE-------------------

expected correct bases after mutation in "weasel" = C + (u * (L - C) / K) - (u * C * (K - 1) / K)

---------------------QUOTE-------------------



"Partitioned search" would be the case where:

 

---------------------QUOTE-------------------

expected correct bases after mutation in PS = C + (u * (L - C) / K) - (0 * u * C * (K - 1) / K)

= C + (u * (L - C) / K) - 0

= C + (u * (L - C) / K)

---------------------QUOTE-------------------



"Locking" or "latching" is the same as removing the term that allows for correct bases to mutate to incorrect ones. What remains is an expectation that the number of correct bases can only monotonically increase.
Posted by: Jkrebs on Mar. 25 2009,15:53

Hi.
Posted by: dvunkannon on Mar. 25 2009,18:49

Quote (Richardthughes @ Mar. 18 2009,12:24)
Can you derive an optimal mutation rate?
---------------------QUOTE-------------------


Start with an optimal population size. Goldberg's research suggests  N= 1.4L, where L is the length of the problem description (and therefore the population members) in bits. That is a good bit higher than the commonplace 50.
Posted by: Wesley R. Elsberry on Mar. 25 2009,18:59

I get N=178 for that. Is that what you get?
Posted by: dvunkannon on Mar. 25 2009,20:09

Quote (Wesley R. Elsberry @ Mar. 25 2009,19:59)
I get N=178 for that. Is that what you get?
---------------------QUOTE-------------------


Yeah, I guessed 27 log 2 was around 4.5 so I got 177.
Posted by: dvunkannon on Mar. 26 2009,09:22

Quote (dvunkannon @ Mar. 25 2009,21:09)
Quote (Wesley R. Elsberry @ Mar. 25 2009,19:59)
I get N=178 for that. Is that what you get?
---------------------QUOTE-------------------


Yeah, I guessed 27 log 2 was around 4.5 so I got 177.
---------------------QUOTE-------------------


I should mention that most of Goldberg's research is in GAs using only a selection operator and a recombination operator, no mutation. This despite publishing papers (see the "Ready to Rumble" series) that show mutation is the more efficient operator in some broad classes of problems.

Since Weasel is really a (1,n)-ES, not a selectorecombinative GA, that population sizing heuristic might not be completely appropriate. But I don't know of other work with as firm a footing.
Posted by: Wesley R. Elsberry on Mar. 26 2009,09:33

While statistically unlikely, successive recombination operations can produce the same changes as point mutation can (this is dependent on having a population with good diversity, of course), so it isn't surprising that recombination might be used as a sole mechanism for change.
Posted by: Wesley R. Elsberry on Mar. 26 2009,17:23

Avida applied to evolutionary biology

The Beneficial Effect of Deleterious Mutations

If they put something up on the work experimenting to test Sewall Wright's shifting-balance theory, I'll post the link.


Posted by: Henry J on Mar. 27 2009,13:28

But wouldn't recombination by itself continually reduce the amount of diversity in the gene pool, and eventually producing a deficit of it?

Henry
Posted by: Richardthughes on Mar. 27 2009,17:08

Pharyngula on "Weasel":

< http://scienceblogs.com/pharyng....put.php >
Posted by: Wesley R. Elsberry on Mar. 27 2009,20:49

Quote (Henry J @ Mar. 27 2009,13:28)
But wouldn't recombination by itself continually reduce the amount of diversity in the gene pool, and eventually producing a deficit of it?

Henry
---------------------QUOTE-------------------


Both genetic drift and natural selection reduce variation, but I wouldn't think it is primarily the choice of mutation modality that affects that.
Posted by: dvunkannon on Mar. 27 2009,23:40

Quote (Henry J @ Mar. 27 2009,14:28)
But wouldn't recombination by itself continually reduce the amount of diversity in the gene pool, and eventually producing a deficit of it?

Henry
---------------------QUOTE-------------------


Yes, recombination and selection lead to convergence, hopefully on the correct allele. Goldberg's Design of innovation is a great resource on these issues in GAs.
Posted by: dvunkannon on Mar. 29 2009,08:24

In the past, I've worked with the ECJ package from Sean Luke's group at George Mason University. It has support for (mu, lamda)-ES built in. I might have time to build a weasel in ECJ. I think it might be a matter of setting up the parameter file right, all the code is already there.
Posted by: dvunkannon on Mar. 29 2009,08:45

Not specifically Weasel...

I just read this paper < on sexual selection in GA >. I was thinking of trying to reproduce (ahem) some of the results. It seems the researchers made a bunch of changes to the standard GA, and I'd like to see which were responsible for the positive variations they report.
Posted by: Wesley R. Elsberry on June 10 2009,20:11

The antievolutionist software, < Mendel's Accountant >, asserts that it "allows realistic numerical simulation of the mutation/selection process over time".

Discussion at < Theology Web >, though, indicates that the program may not deliver results in accordance with known population genetics.

Gary Hurd suggested that this would be a good topic for a TOA FAQ. We can use this thread to help coordinate people working on an analysis of Mendel's Accountant.
Posted by: AnsgarSeraph on June 10 2009,22:07

Hi, all —

I'm another migrant from TWeb; I've got Mendel's Accountant set up on a 32-bit and a 64-bit system. I certainly won't be much help with any actual knowledge but I'm very willing to run simulations for anyone who doesn't have/doesn't want Mendel set up on their computer.

I've found that Mendel will allow my 4GB setup to run small populations (~1000) for about 40,000 generations or larger populations (~10,000) for less than 10,000 generations. I plan on purchasing some extra RAM soon so I might be able to extend the runs a bit. I did not see much difference in latitude running a 64-bit setup but extra RAM might change that.

The user manual for Mendel's Accountant is here. If helpful, I can (hopefully) attach screen grabs of the advanced Mendel settings so people don't need to hunt through the manual.

—Sam
Posted by: AnsgarSeraph on June 10 2009,22:10

Sorry. The manual is HERE:

< Mendel's Accountant User Manual >

There is also a Linux how-to but the SourceForge page does not have the tarball listed. Some advanced features of MENDEL require Linux. They are almost certainly unnecessary for a FAQ but I'll e-mail the maintainer of the code (Dr. Brewer, I think) and try to get that.

—Sam
Posted by: Dr.GH on June 10 2009,22:36

Howdy Sam. Glad to see you here. I'll assume that you have seen the prior discussion on this site starting about < about here. >

I hope to be away all day tomorrow, so I hope that everyone will start right to work while I go fishing.  :D
Posted by: Richardthughes on June 10 2009,22:46

Also:

< http://www.facebook.com/group.php?gid=5381573897 >
Posted by: utidjian on June 11 2009,00:18

Ok... I downloaded the .exe files. Both the earlier version and the update from sourceforge.

I am running Linux (Fedora 10) on a Intel iMac with 1G of RAM.

I unpacked the files with Wine. First the older version and then the newer one. Man it installs a lot of stuff.

The Linux source is in
/home/utidjian/.wine/drive_c/Mendel/Source
on my system. Not much in there.

listing:


---------------------CODE SAMPLE-------------------

[utidjian@istrain Source]$ ls -ogh
total 416K
-rw-rw-rw- 1 4.5K 2008-09-13 17:38 common.h
-rw-rw-rw- 1  587 2008-09-07 22:00 Interface back-end.lnk
-rw-rw-rw- 1  661 2008-09-07 22:00 Interface front-end.lnk
-rw-rw-rw- 1  985 2008-09-01 20:51 Makefile
-rw-rw-rw- 1 165K 2008-10-01 06:15 mendel.f
-rw-rw-rw- 1 163K 2008-09-05 22:00 mendel.f.bak
-rw-rw-rw- 1 1.9K 2008-09-18 18:57 mendel.in
-rw-rw-rw- 1 1.5K 2008-09-04 03:20 mpi_mendel.f
-rw-rw-rw- 1  42K 2006-03-01 13:02 random_pkg.f90
-rw-rw-rw- 1 1.3K 2007-01-15 09:50 sort.f90

---------------------CODE SAMPLE-------------------



The main file in there is mendel.f. Lots of comments. I can "read" Fortran but I don't know diddly about Population Genetics.

Time for bed.

-DU-
Posted by: Zachriel on June 11 2009,07:08

I upgraded to Mendel 1.4.1. It acts differently with the parameters I tried in the original version. I started with the defaults changing only the fraction of beneficial mutations and maximum effect of beneficial mutations parameters.

    1.0000000 frac_fav_mutn
    1.0000000 max_fav_fitness_gain


Now the fitness increases in a linear fashion. So they must have fixed a major bug in between versions. I'll continue to test over the next few days.
Posted by: Dr.GH on June 11 2009,11:22

I collected the various "articles" writen about MA;

Larry Vardiman
2008. “The "Fatal Flaws" of Darwinian Theory” Acts & Facts. 37 (7): 6. Institute of Creation Research
< http://www.icr.org/article/fatal-flaws-darwinian-theory/ >

Money quote:
   

---------------------QUOTE-------------------
“Mendel's Accountant provides overwhelming empirical evidence that all of the "fatal flaws" inherent in evolutionary genetic theory are real. This leaves evolutionary genetic theory effectively falsified--with a degree of certainty that should satisfy any reasonable and open-minded person.”
---------------------QUOTE-------------------



John Sanford, John Baumgardner, Wesley Brewer, Paul Gibson, Walter ReMine
2008a “Using Numerical Simulation to Test the Validity of Neo-Darwinian Theory” In A. A. Snelling (Ed.) (2008). Proceedings of the Sixth International Conference on Creationism (pp. 165–175). Pittsburgh, PA: Creation Science Fellowship and Dallas, TX: Institute for Creation Research.
< Download PDF >


Baumgardner, J., Sanford, J., Brewer, W., Gibson, P., & ReMine, W.
2008b “Mendel’s Accountant: A new population genetics simulation tool for studying mutation and natural selection.” In A. A. Snelling (Ed.), Proceedings of the sixth international conference on creationism (pp. 87–98). Pittsburgh, Pennsylvania: Creation Science Fellowship & Dallas, Texas: Institute for Creation Research.
< Download PDF >

Sanford, J., Baumgardner, J., Gibson, P., Brewer, W., & ReMine, W.
(2007a). Mendel’s Accountant: A biologically realistic forward-time population genetics program. Scalable Computing: Practice and Experience 8(2), 147–165. < http://www.scpe.org. >
< Download PDF >

Sanford, J., Baumgardner, J., Gibson, P., Brewer, W., & ReMine, W.
(2007b). Using computer simulation to understand mutation accumulation dynamics and genetic load. In Y. Shi, G. D. van Albada, J. Dongarra, & P. M. A. Sloot (Eds.), International Conference on Computer Sscience 2007, Part II, Lecture Notes in Computational Science 4488 (pp. 386–392). Springer-Verlag: Berlin, Heidelberg.
< Download PDF >


Posted by: Wesley R. Elsberry on June 11 2009,11:29

So, how much "evolution sucks" verbiage was generated by the people running the pre-1.4.1 versions? Was there any notice that people should re-run their experiments due to a pretty drastic change in program behavior?
Posted by: Dr.GH on June 11 2009,11:35

Quote (Wesley R. Elsberry @ June 11 2009,09:29)
So, how much "evolution sucks" verbiage was generated by the people running the pre-1.4.1 versions? Was there any notice that people should re-run their experiments due to a pretty drastic change in program behavior?
---------------------QUOTE-------------------


Exactly!

PS: The links all work now, and I am going fishing.
Posted by: AnsgarSeraph on June 11 2009,11:41

Quote (Wesley R. Elsberry @ June 11 2009,11:29)
So, how much "evolution sucks" verbiage was generated by the people running the pre-1.4.1 versions? Was there any notice that people should re-run their experiments due to a pretty drastic change in program behavior?
---------------------QUOTE-------------------


To be fair (to a shoddy program? How odd), v. 1.4.1 still "demonstrates" all the genetic entropy problems that Sanford makes noise about. Whatever they fixed in terms of beneficial mutations, our runs at TWeb using 1.4.1 indicate that it's nowhere near enough; a 90% beneficial mutation rate with default "maximal benefit effect" still reduces fitness in a linear manner.

—Sam
Posted by: utidjian on June 11 2009,12:17

Is anyone else playing with the source (or even reading it)?
I also found this file in the Source folder:



---------------------CODE SAMPLE-------------------

[utidjian@buttle Source]$ cat mendel.in
       1000    pop_size
        500    num_generations
          1    fitness_distrib_type:exponential_mutation_effect
          2    selection_scheme:unrestricted_probability_selection
         23    haploid_chromosome_number
       1000    num_linkage_subunits
   0.000000    pop_growth_rate
          0    pop_growth_model:fixed_population
  3.000e+08    haploid_genome_size
  6.0000000    offspring_per_female
  0.0000000    fraction_random_death
  0.0000000    fraction_self_fertilization
 10.0000000    new_mutn_per_offspring
  0.0010000    high_impact_mutn_fraction
  0.1000000    high_impact_mutn_threshold
  0.0010000    uniform_fitness_effect_del
  0.0000000    multiplicative_weighting
  1.000e-05    tracking_threshold
  0.0000000    fraction_recessive
  0.0000000    recessive_hetero_expression
  0.5000000    dominant_hetero_expression
  0.0000000    frac_fav_mutn
  0.0010000    max_fav_fitness_gain
  0.2000000    heritability
  0.0000000    non_scaling_noise
  0.5000000    partial_truncation_value
          0    num_contrasting_alleles
  0.0000000    initial_alleles_mean_effect
  0.9000000    linked_mutn_se_fraction
  1.0000000    se_scaling_factor
          0    synergistic_epistasis
          0    clonal_reproduction
          0    clonal_haploid
          1    dynamic_linkage
          0    fitness_dependent_fertility
          0    is_parallel
          0    bottleneck_yes
       1000    bottleneck_generation
        100    bottleneck_pop_size
        500    num_bottleneck_generations
          0    num_initial_fav_mutn
          1    num_indiv_exchanged
          1    migration_generations
          1    migration_model
          1 homogenous_tribes
      47469 max_tracked_mutn_per_indiv
         42 random_number_seed
          0 write_dump
          0 restart_case
          1 restart_dump_number
test01 case_id
/.
          2 num_tribes
          2 num_procs
          0 plot_avg_data
          0 restart_case_id
          1 restart_append
batch run_queue
          0 c_engine

---------------------CODE SAMPLE-------------------



Anything interesting in there?

-DU-
Posted by: midwifetoad on June 11 2009,12:37

Quote (Richardthughes @ Mar. 27 2009,17:08)
Pharyngula on "Weasel":

< http://scienceblogs.com/pharyng....put.php >
---------------------QUOTE-------------------




---------------------QUOTE-------------------
As Ian Musgrave shows, the program is trivial, and even us biologists can whip one out in minutes.
---------------------QUOTE-------------------



Even non-biologists...

< http://www.itatsi.com >
Posted by: Wesley R. Elsberry on June 11 2009,12:41

If anybody is trying to install MA, but has a more recent Perl and unchecks the Perl install, be warned that the CGI has a fixed location that it expects to launch the Perl executable from, C:\Mendel\Perl\bin\perl.exe
Posted by: AnsgarSeraph on June 11 2009,12:52

Quote (utidjian @ June 11 2009,12:17)
Is anyone else playing with the source (or even reading it)?
I also found this file in the Source folder:



---------------------CODE SAMPLE-------------------

      47469 max_tracked_mutn_per_indiv
         
---------------------CODE SAMPLE-------------------



Anything interesting in there?

-DU-
---------------------QUOTE-------------------


That number is considerably higher than the input parameters on my runs; ranging from a population of 1000 to a population of 10,000, I think the highest setting on my runs for that was ~25,000 tracked mutations per individual. I'm not sure how they calculate for that parameter. This might be one of the limiting factors in testing MENDEL, as I have already run up against a "Favorable mutation count exceeds limit" error.

—Sam
Posted by: Steve Schaffner on June 11 2009,13:02

I may have missed something, but I haven't seen anything in the v. 1.4.1 runs that was clearly wrong, i.e. that suggested a bug in the implementation (other than the broken option for fixed selection coefficient). Which doesn't mean there aren't any bugs, but they are not obvious.

What is clear is that the default parameters for beneficial alleles are very low. Their justification for having such a low maximum beneficial effect strikes me as plausible-sounding nonsense.

It's also clear that their basic model is not one from evolutionary biology. The essential process they're modeling is the accumulation of mildly deleterious mutations, ones that have such a small functional effect that they are invisible to natural selection. This only occurs because the population starts out in a state of genetic perfection, compared to which the new mutations are deleterious. A real population would never have become that optimized, precisely because the different choices of allele are indistinguishable by NS.

For those who have the program running . . . Can it provide more output? Comparing results with theory would be much more straightforward if one could count only the number of mutations that have fixed, rather than all present in the population; it would also be useful to see the allele frequency spectrum. (And if it can't do those things, then it is of no interest as a population genetics tool.)
Posted by: AnsgarSeraph on June 11 2009,13:23

Quote (Steve Schaffner @ June 11 2009,13:02)
For those who have the program running . . . Can it provide more output? Comparing results with theory would be much more straightforward if one could count only the number of mutations that have fixed, rather than all present in the population; it would also be useful to see the allele frequency spectrum. (And if it can't do those things, then it is of no interest as a population genetics tool.)
---------------------QUOTE-------------------


Is the allele frequency spectrum you're looking for in here?

< Box.net - human1 sample Folder >

I think what you're looking for might be titled "human1_plm.png".

The output files in MENDEL do give a count for the number of fixed alleles; it's near the bottom of the output file and looks like this:



---------------------CODE SAMPLE-------------------

Allele summary statistics (tracked mutations only):
   (Statistics are based on       891517 tracked deleterious mutations
                        and            0 tracked   favorable mutations.)
    Very rare   Polymorphic     Fixed      Total
      (0-1%)      (1-99%)      (100%)
       15107        8202           0       23309 deleterious
           0           0           0           0 favorable
---------------------CODE SAMPLE-------------------



—Sam
Posted by: Henry J on June 11 2009,13:27



---------------------QUOTE-------------------
I can "read" Fortran
---------------------QUOTE-------------------


I used to be able to do that, but college was decades ago and the language probably "evolved" since then.



---------------------QUOTE-------------------
Anything interesting in there?

-DU-
---------------------QUOTE-------------------


Yes:



---------------------QUOTE-------------------
10.0000000    new_mutn_per_offspring
---------------------QUOTE-------------------


I'm no biologist, but that sounds high to me.

Henry
Posted by: midwifetoad on June 11 2009,13:40



---------------------QUOTE-------------------
What is clear is that the default parameters for beneficial alleles are very low. Their justification for having such a low maximum beneficial effect strikes me as plausible-sounding nonsense.
---------------------QUOTE-------------------



From my own dabbling I think the only really critical factor in getting a GA to "work" is an effective fitness function. Assuming at least some offspring are viable, the fitness function must be able to see and score alleles, either by seeing them directly of by having some means of scoring phenotypes.

All the other aspects are pretty much irrelevant.

My own effort tries to evolve a population of letter strings that "look like" words, without having a fixed target. The trick is having a fitness function that can score relative wordness without requiring enormous computational resourses.
Posted by: Wesley R. Elsberry on June 11 2009,13:46

I just ran Zachriel's modified parameter set under v1.2.1 and v.1.4.1. Both used exactly the same "mendel.in" configuration file. Only one each, but the v1.4.1 run does go to completion and shows the accumulation and fixation of favorable mutations. The v1.2.1 run, by contrast, shows a declining population that only lasts to generation 31. The value for "before sel: geno fitness" looks particularly strange; in the final generation, the value was -90.5. In the v1.4.1 run, that value was never less than 1.0, and at generation 500 had reached a little over 20. I'm assuming at the moment that the "fitness" value is always with respect to the original absolute value at the start of the run.

Whatever else may be going on, it does seem that MA treatment of favorable mutations changed rather radically between those versions. I wouldn't want to validate v1.4.1 on this basis, but it comparatively is doing a much better job than v1.2.1, and I think my earlier comment stands: outcomes of experiments performed with MA v1.2.1 (and perhaps earlier versions) should be treated with skepticism until independently confirmed, preferably with a package that can be validated against actual popgen results.

ETA: Using Mendel's Accountant v.1.2.1 is like using a bank that inexplicably only records your withdrawals and fails to record your deposits.


Posted by: oldmanintheskydidntdoit on June 11 2009,13:47

Quote (midwifetoad @ June 11 2009,13:40)
The trick is having a fitness function that can score relative wordness without requiring enormous computational resourses.
---------------------QUOTE-------------------


I wonder if google could help, after all it "suggests" words when it cannot match your search term. No suggestion = not "wordy" enough.

OK it's probably not practical, not for running quickly anyway.

Still....
Posted by: midwifetoad on June 11 2009,13:51

Quote (oldmanintheskydidntdoit @ June 11 2009,13:47)
Quote (midwifetoad @ June 11 2009,13:40)
The trick is having a fitness function that can score relative wordness without requiring enormous computational resourses.
---------------------QUOTE-------------------


I wonder if google could help, after all it "suggests" words when it cannot match your search term. No suggestion = not "wordy" enough.

OK it's probably not practical, not for running quickly anyway.

Still....
---------------------QUOTE-------------------


Google runs pretty quickly for me. I'm a bit in awe of their ability to suggest words from misspellings, but after my experience evolving words I think I know how they do it. Or at least one approach that doesn't require a supercomputer.

I think my approach is more effective than that used by most spelling checkers.
Posted by: sledgehammer on June 11 2009,14:05

I haven't run the program, but perusing the description of the algorithm, it seem to me that this section that describes how fitness is assigned is the part that determines the ultimate behavior of the model.  They clearly have built in an asymmetry in the fitness of beneficial vs deleterious mutations, and their justifications of the asymmetry smell fishy to me, but IANAB. (bolding mine)
 

---------------------QUOTE-------------------
To provide users of Mendel even more flexibility in specifying the fitness effect distribution, we have chosen to use a
form of the Weibull function [12] that is a generalization of the more usual exponential function. Our function, expressed
by eq. (3.1), maps a random number x, drawn from a set of uniformly distributed random numbers, to a fitness effect d(x)
for a given random mutation.
d(x) = (dsf) exp(?ax^gamma), 0 < x < 1. (3.1)
Here (dsf) is the scale factor which is equal to the extreme value which d(x) assumes when x = 0. We allow this scale
factor to have two separate values, one for deleterious mutations and the other for favorable ones.
These scale factors are meaningful relative to the initial fitness value assumed for the population before we introduce new mutations. In Mendel we assume this initial fitness value to be 1.0. For deleterious mutations, since lethal mutations exist, we choose dsf del = ?1. For favorable mutations, we allow the user to specify the (positive) scale factor dsf fav. Normally, this would be a small value (e.g., 0.01 to 0.1), since it is only in very special situations that a single beneficial mutation would
have a very large effect.

The parameters a and gamma, both positive real numbers, determine the shape of the fitness effect distribution. We applythe same values of a and gamma to both favorable and deleterious mutations. The parameter a determines the minimum absolute values for d(x), realized when x = 1. We choose to make the minimum absolute value of d(x) the inverse of the haploid genome size G (measured in number of nucleotides) by choosing a = loge(G). For example, for the human genome, G = 3 × 109, which means that for the case of deleterious mutations, d(1) = ?1/G = ?3 × 10?10. For large genomes,
this minimum value is essentially 0. For organisms with smaller genomes such as yeast, which has a value for G on
the order of 107, the minimum absolute effect is larger. This is consistent with the expectation that each nucleotide in a smaller genome on average plays a greater relative role in the organism’s fitness.
The second parameter gamma, can be viewed as ontrolling the fraction of mutations that have a large absolute fitness
effect. Instead of specifying gamma directly, we select two quantities that are more intuitive and together define gamma. The first is theta, a threshold value that defines a “high-impact mutation”. The second is q, the fraction of mutations that exceed this threshold in their effect. For example, a user can first define a high-impact mutation as one that results in 10% or more change in fitness (theta = 0.1) relative to the scale factor and then specify that 0.001 of all mutations (q = 0.001) be in this category. Inside the code the value of is computed that satisfies these requirements. We reiterate that Mendel uses the same value for gamma, and thus the same values for theta and q, for both favorable and deleterious mutations. Figure 3.1 shows the effect of the parameter q on the shape of the distribution of fitness effect. Note that for each of the cases displayed the large majority of mutations are nearly neutral, that is, they have very small effects. Since a utation’s effect on fitness can be measured experimentally only if it is sufficiently large, our strategy for parameterizing the fitness effect distribution in terms of high-impact situtations provides a means for the Mendel user to relate the numerical model input more directly to available data regarding the actual measurable frequencies of mutations in a given biological context.
---------------------QUOTE-------------------


Part of the justification for asymmetry is that some mutations are lethal, meaning that individual has zero probability of reproducing.  OK, but the maximum fitness benefit of a beneficial mutation is "a very small number like 0.001", which is then subject to "heritability factor", typically 0.2, and other probabilities that severely limit its ability to propagate.
 To make matters worse, for some unjustified reason, the same distribution for beneficial and deleterious is used, after severely skewing the results with the above.
Again, IANOB, but it seems to me that a single beneficial mutation can, in many situations like disease resistance, blonde hair, big boobs, etc, virtually guarantee mating success, just like a deleterious mutation can be reproductively lethal.
 I can see easily how the skewed treatment of beneficial vs deleterious mutations could virtually guarantee "genetic entropy", as evidenced by monotonically decreasing population fitness caused by accumulation of deleterious mutational load.

ETA source.  link is above
Sanford, J., Baumgardner, J., Gibson, P., Brewer, W., & ReMine, W.
(2007a). Mendel’s Accountant: A biologically realistic forward-time population genetics program. Scalable Computing: Practice and Experience 8(2), 147–165.
Posted by: Zachriel on June 11 2009,14:12

Quote (AnsgarSeraph @ June 11 2009,11:41)
   
Quote (Wesley R. Elsberry @ June 11 2009,11:29)
So, how much "evolution sucks" verbiage was generated by the people running the pre-1.4.1 versions? Was there any notice that people should re-run their experiments due to a pretty drastic change in program behavior?
---------------------QUOTE-------------------


To be fair (to a shoddy program? How odd), v. 1.4.1 still "demonstrates" all the genetic entropy problems that Sanford makes noise about. Whatever they fixed in terms of beneficial mutations, our runs at TWeb using 1.4.1 indicate that it's nowhere near enough; a 90% beneficial mutation rate with default "maximal benefit effect" still reduces fitness in a linear manner.

—Sam
---------------------QUOTE-------------------


I think a key to understanding Mendel's Accountant is the so-called "Maximal beneficial mutation effects". It defaults to an extremely low number.



---------------------QUOTE-------------------
< Mendel's Accountant User Manual >: Maximal beneficial mutation effects – A realistic upper limit must be placed upon beneficial mutations. This is because a single nucleotide change can expand total biological functionality of an organism only to a limited degree. The larger the genome and the greater the total genomic information, the less a single nucleotide is likely to increase the total. Researchers must make a judgment for themselves of what is a reasonable maximal value for a single base change. The MENDEL default value for this limit is 0.001. This limit implies that a single point mutation can increase total biological functionality by as much as 0.1%. In a genome such as man’s, assuming only 10% of the genome is functional, such a maximal impact point mutation might be viewed as equivalent to adding 300,000 new information-bearing base pairs each of which had the genome-wide average fitness contribution. Researchers need to honestly define the upper limit they feel is realistic for their species. However it should be obvious that, in all cases, the upper limit for beneficial mutation effects ought to correspond to a very small fraction of the total genomic information (i.e. a small number relative to one).
---------------------QUOTE-------------------


There is something wrong with the analysis. They're comparing the selective value of a change to adding thousands of new bases to the genome. But adding 10% to a genome doesn't necessarily make an organism 10% fitter. On the other hand, a small change can often have a very high selective value. Consider a mutation making someone resistant to plague. Maybe he just tastes bad to fleas.

Also, I'm not sure what the number is supposed to represent. Does a value of 1 mean a change in fitness of 1? Shouldn't this scale with absolute fitness? Or is it a fractional? So does 1 represent 100% or a doubling of fitness? PS. I'm guessing the former, but my Accounting time has been somewhat limited.
Posted by: sledgehammer on June 11 2009,14:23

Call me skeptical, but I think  that they put the various "hooks" to skew the beneficial vs deleterious effects into the program for one reason only.
Posted by: Wesley R. Elsberry on June 11 2009,15:02

Has anyone seen anything in MA that deals with compensatory mutations?
Posted by: deadman_932 on June 11 2009,15:57

For those that need a fair (and free!) Population Genetics text, here's one available in PDF form, in a RAR-compressed file, for personal use:  


Alan R. Templeton, “Population Genetics and Microevolutionary Theory”
Wiley-Liss; 1st edition (September 29, 2006) 705 pages

< http://depositfiles.com/en/files/2390240 > (8.05 MB)

Click on "free downloading" and wait for 60-second clock countdown, then download PDF to a folder, and un-rar.

If you need a (clean) free PDF reader, try Foxit : < http://www.foxitsoftware.com/pdf/reader/reader-interstitial.html >

Sanford's "genomic (mutational) meltdown" scenarios are a hoot. Even DaveScot was bright enough to see that Sanford's proposed mutation rates < were out of line with reality >: fast-reproducing sexual species that have existed a few million should have all been extinct by now, but they're not. Sanford inflates deleterious mutation rates and disregards compensatory mechanisms.
Posted by: mammuthus on June 11 2009,18:08

Quote (deadman_932 @ June 11 2009,15:57)
Sanford's "genomic (mutational) meltdown" scenarios are a hoot. Even DaveScot was bright enough to see that Sanford's proposed mutation rates < were out of line with reality >: fast-reproducing sexual species that have existed a few million should have all been extinct by now, but they're not. Sanford inflates deleterious mutation rates and disregards compensatory mechanisms.
---------------------QUOTE-------------------


His argument is a little more involved than that.  It seems to revolve around genome size; the smaller genome size of something like P.falciparum prevents genetic meltdown, but it would occur with larger genome sized mammals.  So genetic entropy is a problem for the latter (if not on Sanford YEC timescales).  You can't just take fast reproducing things like P.falciparum and apply the Genetic Entropy failure in this case widely.  At least that's how I read it.



---------------------QUOTE-------------------
It occured to me recently that Sanford’s projected rate of genetic decay doesn’t square with the observed performance of P.falciparum. P.falciparum’s genome is about 23 million nucleotides. At Sanford’s lowest given rate of nucleotide copy errors that means each individual P.falciparum should have, on average, about 3 nucleotide errors compared to its immediate parent. If those are nearly neutral but slightly deleterious mutations (as the vast majority of eukaryote mutations appear to be) then the number should be quite sufficient to cause a genetic meltdown from their accumulation over the course of billions of trillions of replications. Near neutral mutations are invisible to natural selection but the accumulation of same will eventually become selectable. If all individuals accumulate errors the result is decreasing fitness and natural selection will eventually kill every last individual (extinction). Yet P.falciparum clearly didn’t melt down but rather demonstrated an amazing ability to keep its genome perfectly intact. How?

After thinking about it for a while I believe I found the answer - the widely given rate of eukaryote replication errors is correct. If P.falciparum individuals get an average DNA copy error rate of one in one billion nucleotides then it follows that approximately 97% of all replications result in a perfect copy of the parent genome. That’s accurate enough to keep a genome that size intact. An enviromental catastrophe such as an ice age which lowers temperatures even at the equator below the minimum of ~60F in which P.falciparum can survive would cause it to become extinct while genetic meltdown will not. Mammals however, with an average genome size 100 times that of P.falciparum, would have an average of 3 replication errors in each individual. Thus mammalian genomes would indeed be subject to genetic decay over a large number of generations which handily explains why the average length of time between emergence to extinction for mammals and other multicelled organisms with similar genome sizes is about 10 million years if the fossil and geological evidence paints an accurate picture of the past. I DO believe the fossil and geological records present us with an incontrovertible picture of progressive phenotype evolution that occured over a period of billions of years. I don’t disbelieve common ancestry and phenotype evolution by descent with modification - I question the assertion that random mutation is the ultimate source of modification which drove phylogenetic diversification.
---------------------QUOTE-------------------


Posted by: midwifetoad on June 11 2009,18:11



---------------------QUOTE-------------------
fast-reproducing sexual species that have existed a few million should have all been extinct by now, but they're not.
---------------------QUOTE-------------------



Isn't this the ultimate test of a simulation -- that it must model the fact that populations don't go extinct simply because their genes degrade?

Any simulation where this happens is obviously flawed. History trumps trumps any theory that says something that has happened can't happen.
Posted by: Wesley R. Elsberry on June 11 2009,18:30

OK, why are there still < Amoeba dubia > around? I haven't found an explicit statement of average generation time for the species, but it is likely on the order of 24 hours based on generation times for other amoebae. Its genome is about 670 billion base pairs. That would seem to qualify as a large genome, wouldn't it?
Posted by: mammuthus on June 11 2009,18:47

Quote (Wesley R. Elsberry @ June 11 2009,18:30)
OK, why are there still < Amoeba dubia > around? I haven't found an explicit statement of average generation time for the species, but it is likely on the order of 24 hours based on generation times for other amoebae. Its genome is about 670 billion base pairs. That would seem to qualify as a large genome, wouldn't it?
---------------------QUOTE-------------------


Right, that's that objection answered then!
Posted by: mammuthus on June 11 2009,19:05

By the way, all this genetic entropy (why the stupid name, why not just Muller's Ratchet?) stuff relates to the work of Laurence Loewe at Edinburgh.  He's done a lot of research on Muller's Ratchet, well worth checking out:

< http://evolutionary-research.net/people/lloewe >

also see these classic papers by Michael Lynch:

Lynch, M. et al. 1993. Mutational meltdowns in asexual populations. J. Heredity 84: 339-344

< http://www.indiana.edu/~lynchlab/PDF/Lynch58.pdf >

Gabriel, W. et al. 1993. Muller's ratchet and mutational meltdowns. Evolution 47: 1744-1757.

< http://www.indiana.edu/~lynchlab/PDF/Lynch62.pdf >

I'm not a population geneticist or indeed any kind of evolutionary biologist whatsoever.  But it's my impression that Sanford is saying nothing new; he's just trying to repackage issues that pop gen people have known about for decades.  Indeed, occasional creationist basher Joe Felsenstein published one of the classic papers in this respect:

Felsenstein, J. (1974). The Evolutionary Advantage of Recombination. Genetics, 78, 737–756

Some time ago on PandasThumb, Felsenstein said he'd probably better read the Sanford book as creationists would be using it.  S Cordova offered to send it to him.  It'd be great to get his thoughts.  I think this is the discussion:

< http://pandasthumb.org/archives/2008/05/gamblers-ruin-i.html >
Posted by: mammuthus on June 11 2009,19:08

Aaah yes, found it on the final page of comments:





---------------------QUOTE-------------------
Dr. Felsenstein,

I sent you a copy of John Sanford’s Genetic Entropy.

Let me know if you received it or not. The admins at PT should have my e-mail.

Thank you again for taking time to read what I wrote at UD and for taking the time to respond. I’m deeply honored.

regards, Salvador Cordova
---------------------QUOTE-------------------





---------------------QUOTE-------------------
Sorry for the delay, I didn’t notice this inquiry until recently. Yes, the book arrived. Thanks for sending it. It will be helpful to have it, I am sure.
---------------------QUOTE-------------------


Posted by: Zachriel on June 11 2009,20:41

Take a look at the distribution of beneficial mutations. {The parameters are as on the image and Maximal beneficial mutation effects = 0.1} Beneficial mutations spike, then disappear.


                   Generation 3970, Fitness 0.106, Deleterious 38398, Favorable 0.

The program doesn't seem to use my available memory and quits well before the specified generations. It doesn't seem to reseed the randomizer with each run.


                 Generation 3972, Fitness 0.101, Deleterious 38422, Favorable 0.

Just look at those graphs. That just doesn't look right at all.
Posted by: AnsgarSeraph on June 11 2009,20:50

Quote (Zachriel @ June 11 2009,20:41)
The program doesn't seem to use my available memory and quits well before the specified generations.
---------------------QUOTE-------------------


With a fitness level at 0.1, I'm sure your populations went extinct. I can't keep populations below 1000 alive for very long; they certainly won't last for more than 20,000 generations.

—Sam
Posted by: Zachriel on June 11 2009,21:01

I manually changed the seed. This is what I got with the same parameters.


                   Generation 4376, Fitness 0.111, Deleterious 42350, Favorable, 0.

It's very odd having to change the seed every time. A common method of investigation is to rerun the same parameters to help distinguish trends from flukes.

There's something odd about the distribution. That might be due to the small population, though.
Posted by: deadman_932 on June 11 2009,22:53

Quote (mammuthus @ June 11 2009,18:08)
     
Quote (deadman_932 @ June 11 2009,15:57)
Sanford's "genomic (mutational) meltdown" scenarios are a hoot. Even DaveScot was bright enough to see that Sanford's proposed mutation rates < were out of line with reality >: fast-reproducing sexual species that have existed a few million should have all been extinct by now, but they're not. Sanford inflates deleterious mutation rates and disregards compensatory mechanisms.
---------------------QUOTE-------------------


His argument is a little more involved than that.  It seems to revolve around genome size; the smaller genome size of something like P.falciparum prevents genetic meltdown, but it would occur with larger genome sized mammals.  So genetic entropy is a problem for the latter (if not on Sanford YEC timescales).  You can't just take fast reproducing things like P.falciparum and apply the Genetic Entropy failure in this case widely.  At least that's how I read it.
---------------------QUOTE-------------------


Well, Wes mentioned one example of "large" - genomed rapidly-reproducing species, and there's a lot more available. Mammal genomes average between 2 and 3 gigabases (Gb) but lots of insect and plant genomes (like wheat) can be larger:   around 16 Gb in wheat or grasshoppers (Podisma pedestris) -- five times larger than humans.

Nailing Sanford down on questions about interesting populations like california condors would be fun -- they're the only North American remnant of Gymnogyps, been around since the early Pleistocene and their population dropped down to 22 individuals not very long ago... and their est. genome size is 1.5 Gb. They should have accumulated enough deleterious mutations so that such a small closely-related group would produce nothin' but dead young, right? Or how about Przywalski's horse?

Sanford is a YEC of sorts, so he skewed his parameters to fit his skewed view of the Earth's entire biome being less than 100 K years old, as I recall ( I may be wrong with the exact figure there).

-------------------------------------------

ETA: I was curious about known recessives in the existing condors and there is one identified (chondrodystrophy) that results in fatal abnormalities  :  

J. Geyer, O.A. Ryder, L.G. Chemnick and E.A. Thompson, Analysis of relatedness in the California condors: from DNA fingerprints, Mol. Biol. Evol. 10 (1993), pp. 571–589

Romanov MN, Koriabine M, Nefedov M, de Jong PJ, Ryder OA (2006) Construction of a California Condor BAC Library and First-generation Chicken-condor Comparative Physical Map as an Endangered Species Conservation Genomics Resource, Genomics, 88 (6), 711-8
Posted by: Steve Schaffner on June 11 2009,23:09

Quote (Zachriel @ June 11 2009,20:41)
Take a look at the distribution of beneficial mutations.
---------------------QUOTE-------------------


Looks right to me, given your parameters. You're getting 10 mutations/individual for 100 individuals, or 1000 mutations per generation. Of those, 1/100,000 is beneficial, so you're only getting one beneficial mutation every 100 generations. Those are the tiny blips. Once in a while one or two of them drift up to an appreciable, and the mean number of beneficial alleles per individual climbs above 1.0.

None of them fix though, which is not surprising, since they're almost all effectively neutral. Which means that you should have one fixing by chance every 20,000 generations, plus some probability from the tail at higher selection coefficient.
Posted by: Occam's Aftershave on June 11 2009,23:47

Over at TWeb where this started I asked the same question; why haven't all the fast reproducing mammal species died out from genetic meltdown yet?  The topic of mice was raised, because while mice have a genome roughly the size of humans  (approx. 3 GB), they have a generation time some 170x faster (6 weeks vs. 20 years).  So why haven't all the mice gone extinct by now?

I made the statement ""All other things being equal, the population that breeds faster will accumulate mutations faster."

Jorge Fernandez (a YEC who was acting as a go between to Sanford) supposedly forwarded my questions to Sanford and got this reply:

Sanford:  " No, it is just the opposite, short generation times means more frequent and better selective filtering."

Which makes zero sense and is trivially easy to refute with their own program:

Run Mendel with two populations that are identical in every way (i.e genome size, mutation rate, selection pressure, etc.) except make one generation time 2x the other, say two per year year vs. one per year.

If you run them both for 1000 generations, both will end up with the same (lower) fitness level, but the two per year will only take 500 years to get there.

If you run them both for 1000 years, the once per year will end up in the exact same fitness as the first trial, but the two per year will have 2000 generations and end up with an even lower fitness level, if it doesn't just go extinct first.

These guys are busted, and they know they're busted.  Now it's just a question of how far they can push this shit and how much money they can make before the errors become well known.
Posted by: k.e.. on June 12 2009,00:06

So they have gone from shining shit to simulating shit?

As a game strategy it could be a winner.

More obscurantism in the tard market makes it easier to collect loose fundy shekels.
Posted by: mammuthus on June 12 2009,04:18

This new paper may be of interest:



---------------------QUOTE-------------------
Mustonen, V. and Lassig, M.  (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation.  Trends in Genetics, 25, 111-119.

Evolution is a quest for innovation. Organisms adapt to changing natural selection by evolving new phenotypes. Can we read this dynamics in their genomes? Not every mutation under positive selection responds to a change in selection: beneficial changes also occur at evolutionary equilibrium, repairing previous deleterious changes and restoring existing functions. Adaptation, by contrast, is viewed here as a non-equilibrium phenomenon: the genomic response to time-dependent selection. Our approach extends the static concept of fitness landscapes to dynamic fitness seascapes. It shows that adaptation requires a surplus of beneficial substitutions over deleterious ones. Here, we focus on the evolution of yeast and Drosophila genomes, providing examples where adaptive evolution can and cannot be inferred, despite the presence of positive selection.
---------------------QUOTE-------------------



there's a section on Muller's Ratchet:



---------------------QUOTE-------------------
Here, we argue for a sharpened concept of adaptive evolution at the molecular level. Adaptation requires positive selection, but not every mutation under positive selection is adaptive. Selection and adaptation always refer to a molecular phenotype depending on a single genomic locus or on multiple loci, such as the energy of a transcription-factor-binding site in our first example. This correlates the direction of selection at all loci contributing to the phenotype and calls for the distinction between adaptation and compensation. The infinite-sites approximation, which is contained in many population-genetic models, neglects such correlations and is therefore not optimally suited to infer adaptation [16] and [23]. Here, we address this problem by a joint dynamical approach to selection and genomic response in a genome with finite number of sites. In this approach, adaptive evolution is characterized by a positive fitness flux ?, which measures the surplus of beneficial over deleterious substitutions.

It is instructive to contrast this view of adaptive evolution with Muller's ratchet, a classical model of evolution by deleterious substitutions [53] and [54]. This model postulates a well-adapted initial state of the genome so that all, or the vast majority of, mutations have negative fitness effects. Continuous fixations of slightly deleterious changes then lead to a stationary decline in fitness (i.e. to negative values of ?). Similarly to the infinite-sites approximation, this model neglects compensatory mutations. In a picture of a finite number of sites, it becomes clear that every deleterious substitution leads to the opportunity for at least one compensatory beneficial mutation (or more, if the locus contributes to a quantitative trait), so that the rate of beneficial substitutions increases with decreasing fitness. Therefore, assuming selection is time-independent, decline of fitness (? < 0) is only a transient state and the genome will eventually reach detailed balance between deleterious and beneficial substitutions, that is, evolutionary equilibrium (? = 0). As long as selection is time-independent, an equilibrium state exists for freely recombining loci and in a strongly linked (i.e. weakly recombining) genome, although its form is altered in the latter case by interference selection [55] and [56]. Conversely, an initially poorly adapted system will have a transient state of adaptive evolution (? > 0) before reaching equilibrium. Time-dependent selection, however, continuously opens new windows of positive selection, the genome is always less adapted than at equilibrium and the adaptive state becomes stationary. Thus, we reach a conclusion contrary to Muller's ratchet. Because selection in biological systems is generically time-dependent, decline of fitness is less likely even as a transient state than suggested by Muller's ratchet: the model offers no explanation of how a well-adapted initial state without opportunities of beneficial mutations is reached in the first place.

As a minimal model for adaptive evolution, we have introduced the Fisher-Wright process in a macro-evolutionary fitness seascape, which is defined by stochastic changes of selection coefficients at individual genomic positions on time scales larger than the fixation time of polymorphisms (and is thus different from micro-evolutionary selection fluctuations and genetic draft). Time-dependence of selection is required to maintain fitness flux: the seascape model is the simplest model that has a non-equilibrium stationary state with positive ?. The two parameters of the minimal model (strength and rate of selection changes) are clearly just summary variables for a much more complex reality. The vastly larger genomic datasets within and across species will enable us to infer the dynamics of selection beyond this minimal model.
---------------------QUOTE-------------------


Posted by: damitall on June 12 2009,04:33

Quote (mammuthus @ June 11 2009,18:08)
 
Quote (deadman_932 @ June 11 2009,15:57)
Sanford's "genomic (mutational) meltdown" scenarios are a hoot. Even DaveScot was bright enough to see that Sanford's proposed mutation rates < were out of line with reality >: fast-reproducing sexual species that have existed a few million should have all been extinct by now, but they're not. Sanford inflates deleterious mutation rates and disregards compensatory mechanisms.
---------------------QUOTE-------------------


His argument is a little more involved than that.  It seems to revolve around genome size; the smaller genome size of something like P.falciparum prevents genetic meltdown, but it would occur with larger genome sized mammals.  So genetic entropy is a problem for the latter (if not on Sanford YEC timescales).  You can't just take fast reproducing things like P.falciparum and apply the Genetic Entropy failure in this case widely.  At least that's how I read it.

   

---------------------QUOTE-------------------
It occured to me recently that Sanford’s projected rate of genetic decay doesn’t square with the observed performance of P.falciparum. P.falciparum’s genome is about 23 million nucleotides. At Sanford’s lowest given rate of nucleotide copy errors that means each individual P.falciparum should have, on average, about 3 nucleotide errors compared to its immediate parent. If those are nearly neutral but slightly deleterious mutations (as the vast majority of eukaryote mutations appear to be) then the number should be quite sufficient to cause a genetic meltdown from their accumulation over the course of billions of trillions of replications. Near neutral mutations are invisible to natural selection but the accumulation of same will eventually become selectable. If all individuals accumulate errors the result is decreasing fitness and natural selection will eventually kill every last individual (extinction). Yet P.falciparum clearly didn’t melt down but rather demonstrated an amazing ability to keep its genome perfectly intact. How?

After thinking about it for a while I believe I found the answer - the widely given rate of eukaryote replication errors is correct. If P.falciparum individuals get an average DNA copy error rate of one in one billion nucleotides then it follows that approximately 97% of all replications result in a perfect copy of the parent genome. That’s accurate enough to keep a genome that size intact. An enviromental catastrophe such as an ice age which lowers temperatures even at the equator below the minimum of ~60F in which P.falciparum can survive would cause it to become extinct while genetic meltdown will not. Mammals however, with an average genome size 100 times that of P.falciparum, would have an average of 3 replication errors in each individual. Thus mammalian genomes would indeed be subject to genetic decay over a large number of generations which handily explains why the average length of time between emergence to extinction for mammals and other multicelled organisms with similar genome sizes is about 10 million years if the fossil and geological evidence paints an accurate picture of the past. I DO believe the fossil and geological records present us with an incontrovertible picture of progressive phenotype evolution that occured over a period of billions of years. I don’t disbelieve common ancestry and phenotype evolution by descent with modification - I question the assertion that random mutation is the ultimate source of modification which drove phylogenetic diversification.
---------------------QUOTE-------------------


---------------------QUOTE-------------------




< Here > is an abstract which might inform this particular question
Posted by: Lou FCD on June 12 2009,06:35

You've all forgotten the most important part of the simulation, and that's why your results are skewed.

You have to throw the computer off a cliff to get an accurate simulation.

duh.
Posted by: deadman_932 on June 12 2009,09:28

Quote (Lou FCD @ June 12 2009,06:35)
You've all forgotten the most important part of the simulation, and that's why your results are skewed.

You have to throw the computer off a cliff to get an accurate simulation.

duh.
---------------------QUOTE-------------------


Lou = absotively correckt. Heck, even checker-playing computers have to be painted in squares. Everyone knows that.
Posted by: mammuthus on June 12 2009,10:58

< Jorge Fernandez > at TWeb is in contact with Sanford.  He just posted the following from Sanford:



---------------------QUOTE-------------------
Hi Jorge - I have been traveling ... The comment ... about "cooking the books" is, of course, a false accusation. The issue has to do with memory limits. Before a Mendel run starts it allocates the memory needed for different tasks. With deleterious mutations this is straight-forward - the upper range of mutation count is known. With beneficials it is harder to guess final mutation count - some beneficials can be vastly amplified. Where there is a high rate of beneficials they can quickly exhaust RAM and the run crashes. Wesley Brewer [one of the creators of Mendel] has tried to avoid this by placing certain limits - but fixing this is a secondary priority and will not happen right away. With more RAM we can do bigger experiments. It is just a RAM issue.

Best - John
---------------------QUOTE-------------------



This is in response to - "Wes Elseberry made a comment that I think could be a good title, 'Mendel's Accountant
cooks the books."  I assume that they're talking about the failure of the program to increase fitness when a high number of beneficial mutations are specified.

I guess Sanford et al would argue that this problem isn't a big issue, since there's never a case in which there are loads (e.g. 90%) of beneficial mutations.  Deleterious or slightly deleterious are in the majority in reality, there's no RAM problem with these, and so the main conclusion they draw from Mendel is unaffacted by the problems shown with beneficial mutations.  At least I guess that's what he'd say.

Sanford also says:



---------------------QUOTE-------------------
The fact that our runs crash when we run out of RAM is not by design. If someone can help us solve this problem we would be very grateful. We typically need to track hundreds of millions of mutations. Beneficials create a problem for us because they amplify in number. We are doing the best we can.

I would urge your colleagues [Heaven help me - John is under the impression that you people are my colleagues ... brrrrrrrr!] to use more care. In science we should be slow to raise claims of fraud without first talking to the scientist in question to get their perspective. Otherwise one might unwittingly be engaging in character assassination.
---------------------QUOTE-------------------



< http://www.theologyweb.com/campus....unt=131 >
Posted by: Wesley R. Elsberry on June 12 2009,11:06



---------------------QUOTE-------------------

I guess Sanford et al would argue that this problem isn't a big issue, since there's never a case in which there are loads (e.g. 90%) of beneficial mutations.

---------------------QUOTE-------------------



No, the problem is quantitative and not qualitative. If the program doesn't handle the 90% case correctly, it isn't handling the 0.001% case correctly, either. And we know that v1.2.1 did not handle it correctly. If you are going around claiming to have produced an "accurate" simulation, you are on the hook for that.

The 90% case just makes the error blatantly obvious.

Speaking of hypocrisy, how careful is Sanford in not making sweeping generalizations about biologists having gotten things wrong?
Posted by: Wesley R. Elsberry on June 12 2009,11:13

As demonstrated in the two runs I did comparing the output of v1.2.1 and v1.4.1 on the very same configuration, v1.2.1 has a major error in its handling of beneficial mutations. This has nothing at all to do with memory limits; I also ran both with the default case, and the experimental case used in both merely changed the two parameters as specified by Zachriel above. The memory usage was under 130MB for all cases I ran; the memory I had was sufficient and the simulations ran to completion. Sanford either was given a garbled account of the issue or is deploying a meaningless digression as a response.

ETfix: 130,000KB = 130MB


Posted by: mammuthus on June 12 2009,11:18

Quote (Wesley R. Elsberry @ June 12 2009,11:06)


---------------------QUOTE-------------------

I guess Sanford et al would argue that this problem isn't a big issue, since there's never a case in which there are loads (e.g. 90%) of beneficial mutations.

---------------------QUOTE-------------------



No, the problem is quantitative and not qualitative. If the program doesn't handle the 90% case correctly, it isn't handling the 0.001% case correctly, either. And we know that v1.2.1 did not handle it correctly. If you are going around claiming to have produced an "accurate" simulation, you are on the hook for that.

The 90% case just makes the error blatantly obvious.

Speaking of hypocrisy, how careful is Sanford in not making sweeping generalizations about biologists having gotten things wrong?
---------------------QUOTE-------------------


Ok, thanks Wesley.  I know nothing about programming, so a lot of what I have to say on realted subjects will be utter nonsense!.

I totally concur about Sanford's sweeping generalisations.  He claims that Mendel's Accountant has "falsified" Neo-Darwinian evolution:



---------------------QUOTE-------------------
When any reasonable set of biological parameters are used, Mendel provides overwhelming empirical evidence that all of the “fatal flaws” inherent in evolutionary genetic theory are real. This leaves evolutionary genetic theory effectively falsified—with a degree of certainty which should satisfy any reasonable and open-minded person.
---------------------QUOTE-------------------



and



---------------------QUOTE-------------------
As a consequence, evolutionary genetic theory now has no theoretical support—it is an indefensible scientific model. Rigorous analysis of evolutionary genetic theory consistently indicates that the entire enterprise is actually bankrupt. In this light, if science is to actually be self-correcting, geneticists must “come clean” and acknowledge the historical error, and must now embrace honest genetic accounting procedures.
---------------------QUOTE-------------------



< http://www.icr.org/i....ory.pdf >

I have zero respect for anyone who provides such rhetoric, without actually submitting their claims to review by the scientific community.  The very people they are lambasting.  That is fundamentally dishonest.

Sam at TWeb has emailed Sanford to see if he will engage directly at that messageboard.  Could be interesting.
Posted by: mammuthus on June 12 2009,11:22

Oh and an additional response from Sanford.  This is an explanation as to why such low population sizes (1000) were used and how this doesn't affect their conclusions.  In addition it's a response to the question of why mice (as an example of a pretty fast reproducing species) have not yet gone extinct.



---------------------QUOTE-------------------
Hi Jorge - Please tell these folks that I appreciate their interest in Mendel, and if they see certain ways we can make it more realistic, we will try and accommodate them.

Mendel is fundamentally a research tool, and so offers a high degree of user-specification. There is no inherently "realistic" population size - it just depends on what circumstance you wish to study. The default setting for population size is set at 1000 because it is convenient - whether you are using the Windows version on your laptop, or any other computer, you are less likely to run out of memory. We are proceeding to study population size and also population sub-structure. I believe larger populations should realistically be set up as multiple tribes with a given migration rate between tribes. Under these conditions we see little improvement with larger population sizes. But they are welcome to do bigger runs if they have the memory resources.

The mouse question is interesting. I think one would need
to change various parameters for mouse - each species is
different. I would like to know the maximal (not minimal)
generation time - do they know? This would define the
maximal time to extinction. I have read that the per
generation mutation rate is about an order of magnitude
lower in mouse - which makes sense if there are fewer cell
divisions in the generative cells between generations.
I would be happy to do such experiments when I get the
input data.

Best - John
---------------------QUOTE-------------------



< http://www.theologyweb.com/campus....unt=134 >
Posted by: Wesley R. Elsberry on June 12 2009,11:25

Does anyone know of an open-source UML system that takes FORTRAN code as input?
Posted by: deadman_932 on June 12 2009,11:47

Quote (mammuthus @ June 12 2009,10:58)
Jorge Fernandez at TWeb is in contact with Sanford.  He just posted the following from Sanford:

   

---------------------QUOTE-------------------
Hi Jorge - I have been traveling ...The comment...about "cooking the books" is, of course, a false accusation. The issue has to do with memory limits. Before a Mendel run starts it allocates the memory needed for different tasks. With deleterious mutations this is straight-forward - the upper range of mutation count is known. With beneficials it is harder to guess final mutation count - some beneficials can be vastly amplified. Where there is a high rate of beneficials they can quickly exhaust RAM and the run crashes. Wesley Brewer [one of the creators of Mendel] has tried to avoid this by placing certain limits - but fixing this is a secondary priority and will not happen right away. With more RAM we can do bigger experiments. It is just a RAM issue.

Best - John
---------------------QUOTE-------------------



This is in response to - "Wes Elseberry made a comment that I think could be a good title, 'Mendel's Accountant
cooks the books."  I assume that they're talking about the failure of the program to increase fitness when a high number of beneficial mutations are specified...
[snip]

Sanford also says:
 

---------------------QUOTE-------------------
"The fact that our runs crash when we run out of RAM is not by design. If someone can help us solve this problem we would be very grateful. We typically need to track hundreds of millions of mutations. Beneficials create a problem for us because they amplify in number. We are doing the best we can. I would urge your colleagues [Heaven help me - John is under the impression that you people are my colleagues ... brrrrrrrr!] to use more care. In science we should be slow to raise claims of fraud without first talking to the scientist in question to get their perspective. Otherwise one might unwittingly be engaging in character assassination."
---------------------QUOTE-------------------


< http://www.theologyweb.com/campus....unt=131 >
---------------------QUOTE-------------------


That's interesting, because the 2008 ICR "Proceedings of the Sixth International Conference on Creationism (pp. 87–98)." Has a "paper" by John Baumgardner, John Sanford, Wesley Brewer, Paul Gibson and Wally Remine.

The title of that paper is  < "Mendel’s Accountant: A New Population Genetics Simulation Tool for Studying Mutation and Natural Selection" >  (.PDF link)

So what does John Sanford say there? Well, he says this:  

 

---------------------QUOTE-------------------
Mendel  represents  an  advance  in  forward-time simulations by incorporating several improvements over previous simulation tools...
Mendel is tuned for speed, efficiency and memory usage to handle large populations and high mutation rates....
We  recognized that to track millions of individual mutations in a sizable population over many generations, effcient use of memory would be a critical issue – even with the large amount of memory commonly available on current generation computers. We therefore selected an approach that uses a single 32-bit (four-byte) integer to encode a mutation’s fitness effect, its location in the genome, and whether it is dominant or recessive. Using this approach, given 1.6 gigabytes of memory on a single microprocessor, we can accommodate at any one time some 400 million mutations...This implies that, at least in terms of memory, we can treat reasonably large cases using a single processor of the type found in many desktop computers today.
---------------------QUOTE-------------------



I await the actual achievement of these claims with un-bated breath. All emphases are mine.
Posted by: deadman_932 on June 12 2009,12:11

Quote (Wesley R. Elsberry @ June 12 2009,11:25)
Does anyone know of an open-source UML system that takes FORTRAN code as input?
---------------------QUOTE-------------------


You might want to look through these: < http://olex.openlogic.com/wazi....lopment >

ETA:  Sorry, Nope, I can't find anything open-source... and I looked quite a bit at various fora, etc.
Posted by: Wesley R. Elsberry on June 12 2009,12:14

Acceleo seems to be able to generate FORTRAN from UML, but I'm looking for a free tool to generate UML from FORTRAN.
Posted by: Wesley R. Elsberry on June 12 2009,13:42

Mutations are not beneficial, neutral, or detrimental on their own, nor is their contribution to fitness fixed for all time. Mutations contribute to fitness in a context, and as the context changes, so may the value of its contribution to fitness. Fitness is a value that applies to the phenotype in ensemble. Mendel's Accountant appears instead to assume that mutations have a fixed value that cannot be changed by context. Thus, Mendel's Accountant appears to completely ignore research on compensatory mutations.

Because the value of a mutation depends on context, a particular mutation may be beneficial, neutral, or detrimental at initial appearance, but later become part of a different class as other mutations come into play. Mendel's Accountant treats mutations as falling into a fixed class.

These faults alone suffice to disqualify Mendel's Accountant from any claim to providing an accurate simulation of biological evolution.
Posted by: Wesley R. Elsberry on June 12 2009,16:28

Of course, I tend to think that a good approach to critique of a program to do a particular task is to actually produce a program that does that task better. I think that is something that we could give some thought to here. Much of the same background work applies to analysis of MA or design of an alternative.

Some ideas:

- Develop a test suite based on published popgen findings in parallel with development

- Base it on the most general, abstract principles for broad applicability

- Aim for number of generations to be limited only by amount of disk or other long-term storage available

- Consider means for handling large population sizes

- Start with a simple system, either as run-up to version 1 or with sufficient generality to be extensible to more complex systems

It seems to me that producing a thoroughly-vetted and tested platform that covers fewer cases is far better than producing a large, unwieldy, and bug-ridden product whose output cannot be trusted.
Posted by: Bob O'H on June 12 2009,17:02



---------------------QUOTE-------------------
I'm not a population geneticist or indeed any kind of evolutionary biologist whatsoever.  But it's my impression that Sanford is saying nothing new; he's just trying to repackage issues that pop gen people have known about for decades.  
---------------------QUOTE-------------------


What's new is his claim that meltdown affects sexual populations.  I should check the evolution of sex literature, I'm sure they (Sally Otto and Nick Barton, amongst others) showed that it doesn't happen.  In his book Sanford ignores the recent evolution of sex literature.

Wes -


---------------------QUOTE-------------------
OK, why are there still Amoeba dubia around?
---------------------QUOTE-------------------


Indeed - hasn't it turned into Amoeba dubya?

Anyway, remember that Sanford is a YEC, so millions of years aren't relevant for him.
Posted by: Dr.GH on June 12 2009,20:44

Quote (Wesley R. Elsberry @ June 12 2009,14:28)
Of course, I tend to think that a good approach to critique of a program to do a particular task is to actually produce a program that does that task better. I think that is something that we could give some thought to here. Much of the same background work applies to analysis of MA or design of an alternative.

Some ideas:

- Develop a test suite based on published popgen findings in parallel with development

- Base it on the most general, abstract principles for broad applicability

- Aim for number of generations to be limited only by amount of disk or other long-term storage available

- Consider means for handling large population sizes

- Start with a simple system, either as run-up to version 1 or with sufficient generality to be extensible to more complex systems

It seems to me that producing a thoroughly-vetted and tested platform that covers fewer cases is far better than producing a large, unwieldy, and bug-ridden product whose output cannot be trusted.
---------------------QUOTE-------------------


Wes, How would your proposed project improve on other programs? For example, of the goals that you list, does existing software such as AVIDA or other models not already satisfy you criticisms?

Next, I see that there are two goals. The first is to refute lame ass creatocrap like "“Mendel's Accountant provides overwhelming empirical evidence that all of the "fatal flaws" inherent in evolutionary genetic theory are real. This leaves evolutionary genetic theory effectively falsified--with a degree of certainty that should satisfy any reasonable and open-minded person.”

The second would be to actually advance the scientific work of evo simulations.

I might be able to assist the first, and I am happy to leave the second to the rest of you.

Your list of ideas do add to the refutation of the creatocrap, as they are features of what a good simulator should be able to do.
Posted by: Steve Schaffner on June 12 2009,21:56

There may be some value in checking Mendel's Accountant, to see whether it really implements the model that it claims to, but I don't see much point in trying to cobble together a new program to simulate evolution here. That is a major research project, with many unknown parameters, i.e. a truly realistic simulation of evolution isn't possible yet.

The important questions about MA, assuming the program isn't simply fatally flawed, concern the model that it is implementing. For the default values, you don't have to run the program to know that it will produce genetic collapse of the population -- that's inevitable, given the assumptions of the model. The model assumes a large number of mildly deleterious mutations, so mild that they are unaffected by purifying selection. It also assumes purely hard selection, in which lower fitness translates directly into loss of fertility for the population, and few beneficial mutations (which are also of small effect), independent of the fitness of the population (i.e. no compensating mutations). Given those assumptions, the population will inevitably decline towards extinction, since there is no force counteracting the relentless accumulation of deleterious mutations. The model stands or falls on those assumptions; the code is a side-issue.
Posted by: Wesley R. Elsberry on June 12 2009,22:07

I'll check with Reed Cartwright about existing popgen packages.

Avida is not a package that aims to directly model biological population genetics. There is quite a lot of conceptual overlap between Avida evolution and biological evolution, but it isn't at the low-level that MA claims to operate at. For example, current research in Avida is looking at the role of compensatory mutations. But each mutation in an Avidian is an instruction, not a base as in DNA. The Avida research can provide another line of evidence that complements that of biological research on compensatory mutations, but it isn't aimed at answering questions like, "At what rate should we expect compensatory mutations to fix in species X?" It's that kind of question that the folks pushing MA position it as a tool to answer, or would if they took any note of compensatory mutations at all. I think you get the drift, though.

Avida, by the way, has no difficulty in cranking out data on generation after generation for a set population size. Most Avida work is done on population sizes between 900 and 10,000 Avidians. However, I'm working on extending the Avida-ED program, and one part is to allow up to 90,000 organisms in the population. That is, by the way, accomplished with a change to the graphical user interface to allow selection of a grid size of up to 300 by 300, where the current version's grid-size slider only goes up to 100. The underlying Avida instance is unchanged. Tracking mutations is possible in Avida as well, but is done in analysis after the run finishes. Runs can go into the millions of updates. I don't know that anyone has tried to find an upper limit. Avida's ability to do this is because it only needs to hold the current population and grid in memory. Everything else gets written to disk.
Posted by: midwifetoad on June 12 2009,23:43

Who woulda thunk a few years ago that entry level computers would have four gigs of memory, or that the retail price of four gigs would be about $19.95?
Posted by: dvunkannon on June 13 2009,00:12

Quote (Wesley R. Elsberry @ June 12 2009,23:07)
I'll check with Reed Cartwright about existing popgen packages.

Avida is not a package that aims to directly model biological population genetics. There is quite a lot of conceptual overlap between Avida evolution and biological evolution, but it isn't at the low-level that MA claims to operate at. For example, current research in Avida is looking at the role of compensatory mutations. But each mutation in an Avidian is an instruction, not a base as in DNA. The Avida research can provide another line of evidence that complements that of biological research on compensatory mutations, but it isn't aimed at answering questions like, "At what rate should we expect compensatory mutations to fix in species X?" It's that kind of question that the folks pushing MA position it as a tool to answer, or would if they took any note of compensatory mutations at all. I think you get the drift, though.

Avida, by the way, has no difficulty in cranking out data on generation after generation for a set population size. Most Avida work is done on population sizes between 900 and 10,000 Avidians. However, I'm working on extending the Avida-ED program, and one part is to allow up to 90,000 organisms in the population. That is, by the way, accomplished with a change to the graphical user interface to allow selection of a grid size of up to 300 by 300, where the current version's grid-size slider only goes up to 100. The underlying Avida instance is unchanged. Tracking mutations is possible in Avida as well, but is done in analysis after the run finishes. Runs can go into the millions of updates. I don't know that anyone has tried to find an upper limit. Avida's ability to do this is because it only needs to hold the current population and grid in memory. Everything else gets written to disk.
---------------------QUOTE-------------------


Umm, what happened to Model-View-Controller? The idea that the model is costrained by the UI is pretty scary.
Posted by: Wesley R. Elsberry on June 13 2009,00:37

Avida-ED is not meant to be how researchers usually use Avida. Avida-ED is a GUI interface aimed at making use of a limited set of Avida options easy for pedagogy. There are quite a number of simplifying restrictions that Avida-ED imposes for the runs that can be made with it, but there are still enough parameters under instructor or student control to allow a great many different experiments to be done with it.

< Links to both Avida and Avida-ED via the Devolab >
Posted by: Steve Schaffner on June 13 2009,07:01

There is a good list of forward simulation (as opposed to coalescent) programs < here >. The ones I'm familiar with (apart from MA) are intended to model the behavior of sequences and mutations, not the global evolution of species.
Posted by: Dr.GH on June 13 2009,09:21

Quote (Wesley R. Elsberry @ June 12 2009,14:28)
- Develop a test suite based on published popgen findings in parallel with development
---------------------QUOTE-------------------


This seems to me to be an excellent way to test MA- known data parameters from known populations.

Also, what were the release dates for the different versions of MA compared to the different publication/conference dates?


Posted by: Tracy P. Hamilton on June 13 2009,10:28

Quote (Bob O'H @ June 12 2009,17:02)
 

What's new is his claim that meltdown affects sexual populations.  I should check the evolution of sex literature, I'm sure they (Sally Otto and Nick Barton, amongst others) showed that it doesn't happen.  In his book Sanford ignores the recent evolution of sex literature.
---------------------QUOTE-------------------



Is Sanford practicing abstinence from the sex literature?  :)
Posted by: Zachriel on June 14 2009,08:47

When each child has more than a single mutation, then average fitness can decrease over time. Even though fitness is a relative term, this decreasing fitness can impact essential biological mechanisms.

If the genome is 3e8 bases in size (or any such large number) and there is an average of one mutation per child, then we expect that ~1/3 of the children will *not* have mutations. If each mother produces 6 children, then chances are that each new generation will include many individuals without mutations. (E.g. mice often have several litters of 4-10 pups.)

If we use truncated selection, heritability=1, mutations=1, seed=30, all else default, this is what we see.



It's interesting to see how the deleterious mutations ride along with the beneficial mutations until fixation before being weeded out.

In nature, we expect that if the mother produces enough children, then there should be sufficient healthy progeny to prevent genetic meltdown and to allow the positive ratcheting of beneficial mutations. We might also expect that species will tend to push the envelope with regard to mutational limitations. That means when near the margins small changes in parameters will allow it tip one way or the other. With slow reproducers, we can then expect various factors that mitigate the long-term evolutionary trajectory with regard to the accumulation of deleterious mutations. One such factor is sexual selection which is prevalent in nearly all taxa of interest.

We can reasonably show that selection of a wide number of parameters avoids mutational meltdown and we would expect life to evolve to explore the limits of these parameters. Hence, to claim that a simplified simulation such as Mendel's Accountant can disprove evolution is not justified.

We're still rather curious about the selection parameters,

   Truncation
   Unrestricted probability
   Strict proportionality probability
   Partial truncation

i.e. exactly what each selection criterion is doing.
Posted by: Zachriel on June 14 2009,09:48

Also curious about the "heritability" parameter.



---------------------QUOTE-------------------
< Mendel's Accountant User Manual >: Individual genetic fitness values are calculated based upon each individual’s total mutation inventory. Individual genetic fitness is defined as 1.0, adjusted by the positive and negative effects of all its mutations. To obtain phenotypic fitness the genetic fitness is modified using the specified heritability to account for nonheritable factors such as variations in the environment.
---------------------QUOTE-------------------


What is the other 80%? Randomness?
Posted by: Bob O'H on June 14 2009,10:20

It should be.  I wonder how they calculate it, though.  The description makes it look like they calculate the genotypic variation in fitness, and then multiply it by 4 to get the environmental variance, and create a random environmental effect for each individual with that variance.  But that's nuts, because it means that the amount of environmental variation in fitness is directly tied to the amount of genetic variation.

it's possible, though, that they're doing nothing like this: the description is rather confused.
Posted by: mammuthus on June 15 2009,18:38

More from Sanford.  Apparently, MA is the "state of the art" in numerical genetics. Also, he wants to collaborate with y'all. According to Jorge:
 

---------------------QUOTE-------------------
The last thing that JS did was ask me for the email address of the people performing these simulations & asking the questions so as to jointly work towards the goal of a more realistic & acceptable-by-all Mendel program. Collaborative science at it best.
---------------------QUOTE-------------------


anyway, here is his latest.  In italics are the points being responded to (from Steve Schaffner)


---------------------QUOTE-------------------
Dear Colleague - If we can make the program more
realistic, we will. Please explain what you would like
done ... How would you have us model soft selection?

I fail to see why mutations should not cause
extinction, especially given the additive model. As we
approach zero mean fitness, many individuals will
have a fitness of zero or less - we are forced to
truncate them (if you are dead you should not
realistically reproduce), causing population size to
start to rapidly shrink. When there are less than two
individuals, we consider the population extinct.

2) The default value for the maximum beneficial value of mutations is much too low.
Real-world estimates of positive selection coefficients for humans are in the range
of 0.1, not 0.001.


That is easily re-set, but one has to consider if
it is reasonable to realistically build up a genome
by increments of 10% (I am speaking of internal
complexity - not adaptation to an external
environmental factor). I think that is like going
up Mt. Improbable using a helicopter.

3) The starting population is genetically perfect, and all deviations from that state
increase the chance of extinction. This does not accurately model an evolutionary
process, in which no population ever achieves perfection, merely adequacy.
The fact that an ideal organism would have a major competitive advantage
compared to the real one does not imply that the real one is nonfunctional or
doomed to extinction. This is not a model of biological evolution.


We do not assume an ideal starting genotype - we
assume a uniform population after a population
bottleneck - with fitness set arbitrarily at 1.0.

Finally, I also have a technical problem with the program as a software tool.
It does not seem to be possible to run it indefinitely, nor have I seen any cases
where it has even been able to run to equilibrium (or better, steady state).
Whether that is because it continues to track mutations after they fix I don't
know (that's my guess), but it means it is essentially useless as a research tool.
It should be possible to simulate a population of size, say, 20,000 for 200,000
generations. What would the memory requirements for that set of parameters be?
Is the program really able to use the extra memory?


We can turn off individual mutation tracking and just
track the net fitness of each linkage block. We get
nearly indeterminate processing - but we lose lots of
interesting data. I would be happy to cooperate with
you - if you are interested. As far as I can determine,
Mendel is now the "state of the art" in genetic
numerical simulation, and it improves every
month. Are you aware of a better research platform?

Best wishes - John Sanford
---------------------QUOTE-------------------



< http://www.theologyweb.com/campus....unt=161 >

For full context, here is the material Sanford is responding too:


---------------------QUOTE-------------------
I haven't raised any claims of fraud, nor am I clamoring for an immediate response. I have the following problems with model, based on what I've seen here.

1) There does not seem to be an option for true soft selection. Even if deleterious alleles do not affect fertility, they still cause the population to become extinct. This is not an accurate model of real genetics.

2) The default value for the maximum beneficial value of mutations is much too low. Real-world estimates of positive selection coefficients for humans are in the range of 0.1, not 0.001.

3) The starting population is genetically perfect, and all deviations from that state increase the chance of extinction. This does not accurately model an evolutionary process, in which no population ever achieves perfection, merely adequacy. The fact that an ideal organism would have a major competitive advantage compared to the real one does not imply that the real one is nonfunctional or doomed to extinction. This is not a model of biological evolution.

Finally, I also have a technical problem with the program as a software tool. It does not seem to be possible to run it indefinitely, nor have I seen any cases where it has even been able to run to equilibrium (or better, steady state). Whether that is because it continues to track mutations after they fix I don't know (that's my guess), but it means it is essentially useless as a research tool. It should be possible to simulate a population of size, say, 20,000 for 200,000 generations. What would the memory requirements for that set of parameters be? Is the program really able to use the extra memory?

If Sanford (or co-author) wishes to address these criticisms, I would welcome the response. As it stands, however, I do not see how one can use this model to make any statements about the likely behavior of evolving populations in the real world.
---------------------QUOTE-------------------


Posted by: Zachriel on June 15 2009,21:00



---------------------QUOTE-------------------
John Sanford: That is easily re-set, but one has to consider if it is reasonable to realistically build up a genome by increments of 10% (I am speaking of internal complexity - not adaptation to an external environmental factor). I think that is like going up Mt. Improbable using a helicopter.
---------------------QUOTE-------------------


Which goes to show that he doesn't understand his own simulation. Mendel's Accounant doesn't model "internal complexity". It purports to abstract selective differences.

A specific limit to beneficial mutations may not make sense. Some mutations may sweep over a population rapidly. It is certainly conceivable that a seemingly minor mutation could dramatically increase reproductive success, perhaps many-fold, such as when there is only one male that reproduces.
Posted by: midwifetoad on June 16 2009,01:11

There are two flavors of Creation Math:

Proving that observed phenomena are impossible, and proving that unobserved phenomena are inevitable.
Posted by: Dr.GH on June 16 2009,10:26

Quote (mammuthus @ June 15 2009,16:38)
"More from Sanford.  Apparently, MA is the "state of the art" in numerical genetics. Also, he wants to collaborate with y'all. According to Jorge: ..."
---------------------QUOTE-------------------


This is interesting, but it is a wasted effort if filtered through Jorge F.

If Sanford wanted to, he would register at TWeb. I don't see the need to invite him here. But, that might be preferable to TWeb because we could be better assured of moderator/admin issues (ie. no need to bother filtering the JorgF crap).
Posted by: sledgehammer on June 16 2009,14:42

Quote (Zachriel @ June 15 2009,19:00)
         

---------------------QUOTE-------------------
John Sanford: That is easily re-set, but one has to consider if it is reasonable to realistically build up a genome by increments of 10% (I am speaking of internal complexity - not adaptation to an external environmental factor). I think that is like going up Mt. Improbable using a helicopter.
---------------------QUOTE-------------------


Which goes to show that he doesn't understand his own simulation. Mendel's Accounant doesn't model "internal complexity". It purports to abstract selective differences.

A specific limit to beneficial mutations may not make sense. Some mutations may sweep over a population rapidly. It is certainly conceivable that a seemingly minor mutation could dramatically increase reproductive success, perhaps many-fold, such as when there is only one male that reproduces.
---------------------QUOTE-------------------


I think it is even worse that that Zach, for three reasons:

Firstly, the parameter in question is the "maximal fitness effect of a beneficial mutation". This has nothing to do with the physical makeup of the genome, as Dr Sanford seems to imply.  It seems absurd to link this parameter to some implication of "affected base pairs", much less some inferred "infusion of information" into the genome, as the description of this parameter in the MA manual below seems to imply:
         

---------------------QUOTE-------------------
< Mendel's Accountant User Manual >: Maximal beneficial mutation effects – A realistic upper limit must be placed upon beneficial mutations. This is because a single nucleotide change can expand total biological functionality of an organism only to a limited degree. The larger the genome and the greater the total genomic information, the less a single nucleotide is likely to increase the total. Researchers must make a judgment for themselves of what is a reasonable maximal value for a single base change. The MENDEL default value for this limit is 0.001. This limit implies that a single point mutation can increase total biological functionality by as much as 0.1%. In a genome such as man’s, assuming only 10% of the genome is functional, such a maximal impact point mutation might be viewed as equivalent to adding 300,000 new information-bearing base pairs each of which had the genome-wide average fitness contribution. Researchers need to honestly define the upper limit they feel is realistic for their species. However it should be obvious that, in all cases, the upper limit for beneficial mutation effects ought to correspond to a very small fraction of the total genomic information (i.e. a small number relative to one).
---------------------QUOTE-------------------



Call me cynical, but when this much handwaving is applied to a point,  I suspect it means that this is a crucial parameter when it comes to justifying the conclusion that "genetic entropy" leads inevitably to genetic meltdown.

Secondly, there seems to be no reason to limit the maximal fitness effect of a beneficial mutation to a very small number.  Clearly, as pointed out by Z and others, in a competitive environment, there seems to be no reason a single beneficial mutation cannot virtually guarantee reproductive success. In other words, why can't the maximal fitness benefit of a beneficial mutation be something close to unity?

Lastly, the parameter in question is implemented as a scale factor on the probability distribution of fitness effect, which, for no apparent reason, is hard coded to be identical to the shape of the PDF of fitness effects of deleterious mutations  (whose scale factor is hard coded to be -1, i.e. instant death).
 The effect of this fitness effect PDF scaling is even more significant when one considers that the PDF shape is already heavily skewed so that the vast majority of mutations fall "under the radar" of selection. So now, all beneficial mutations are 1000-fold (default value 0.001) less likely to become fixed in the population through selection, hard or soft.  That seems unrealistic to me, so say the least.

To mitigate the above bias against beneficials,  I recommend setting the maximal beneficial fitness parameter close to unity, which will symmetrize the fitness effect probability distribution, and then play with the parameter that determines the proportion of beneficial mutations ( i.e. set the ratio of beneficial to deleterious mutations to .001 or whatever).
 My guess is that this will "level the playing field" and have a significant effect on the overall fitness trend.

Maybe someone who has MA up and running (Zach, Sam?) could try this and report?

P.S. I occurs to me that the PDF shape symmetry might also help account for the effects of the mutational "flipping" of deleterious to beneficial and vice versa.

ETA clarification: "be fixed by selection" is now "become fixed in the population through selection"
Posted by: JohnW on June 16 2009,15:47

Quote (sledgehammer @ June 16 2009,12:42)
I think it is even worse that that Zach, for three reasons:

<snip>
---------------------QUOTE-------------------


I think you're right, sledgehammer.  It looks like Sanford thinks the beneficial effect of a mutation is constrained by its "size" relative to the total size of the genome - the bigger the genome, the smaller the effect of a single mutation.  

Assuming you and I are not misinterpreting, I think there are two possibilities:
(a) - he really is that dumb (perhaps he thinks giraffes have more neck genes than humans);
(b) - this is yet another silly exercise in apologetics - slosh a lot of sciency talk around, but fix it to make sure you get the answer Jesus wants you to get.

I'm strongly leaning (b).
Posted by: AnsgarSeraph on June 16 2009,17:01

Quote (sledgehammer @ June 16 2009,14:42)
Maybe someone who has MA up and running (Zach, Sam?) could try this and report?
---------------------QUOTE-------------------


I was actually running simulations on this idea earlier today and will be posting some graphs up on TWeb within the next hour. The runs were identical, save that one run had a "maximal fitness effect" of 0.001 and the other a maximal fitness effect of 0.01. The first population, after ~6000 generations, had a fitness of 0.844. The second, after the same number of generations, had a fitness of 1.350. The number of beneficial mutations was cranked up to 75% to ensure the runs went for a significant number of generations.

I also received an e-mail from Dr. Sanford today, part of which dealt in this area . . . I am waiting to hear from him whether I have permission to forward his entire e-mail on forums; there wasn't any revelatory information, in any case. Dr. Sanford feels that 0.001 is an appropriate setting but is willing to discuss whether that is the case or not.

—Sam
Posted by: sledgehammer on June 16 2009,17:46

Thanks Sam. Overall fitness switching from a negative trend to positive as a result of a 10X increase in "maximal beneficial fitness" seems to confirm that it is one of the most important parameters as far as the trend is concerned. Unfortunately, I can't see the attachments you posted on TWeb without registering.  Any chance you could post them here as well?  (Post them to < my photobucket page >, and then link them here via the image tag)
Posted by: Steve Schaffner on June 16 2009,18:54

Quote (Zachriel @ June 14 2009,08:47)
If the genome is 3e8 bases in size (or any such large number) and there is an average of one mutation per child, then we expect that ~1/3 of the children will *not* have mutations. If each mother produces 6 children, then chances are that each new generation will include many individuals
without mutations. (E.g. mice often have several litters of 4-10 pups.)

---------------------QUOTE-------------------


True, although that's probably not a good model for humans, who have something between 1 and 3 deleterious mutations (probably) per birth, and more likely close to the top end than the bottom of the range. That doesn't mean that the population has to collapse genetically. It just means that in the steady state, everyone is carrying a fair number of deleterious mutations, with those having the most being the least likely to reproduce.



---------------------QUOTE-------------------

If we use truncated selection, heritability=1, mutations=1, seed=30, all else default, this is what we see.



---------------------QUOTE-------------------


Note that the population survives only because of truncation selection, which is not a realistic process for such slightly deleterious mutations. In this model, each individual will have on average one new mutation with a negative selection coefficient of something like 10-6 or 10-7, but selection is nonetheless effective enough to perfectly sort the fitness of the individuals and eliminate only the least fit.



---------------------QUOTE-------------------

It's interesting to see how the deleterious mutations ride along with the beneficial mutations until fixation before being weeded out.

---------------------QUOTE-------------------


Yes. Selective sweeps in action.
Posted by: AnsgarSeraph on June 16 2009,19:00

I didn't see an option for me to add to your Photobucket page, so I registered and created my own.

Here are the files. The first two are fitness graphs:






These are the distribution graphs for JSW507 (deleterious first, beneficial second):






And 508:







I've uploaded all the pictures, as well as the Input and Output files to my Box.net account, also:

< MENDEL Run - JSW507, JSW508 >

—Sam
Posted by: midwifetoad on June 16 2009,19:12



---------------------QUOTE-------------------
It just means that in the steady state, everyone is carrying a fair number of deleterious mutations, with those having the most being the least likely to reproduce.

---------------------QUOTE-------------------


I'm having trouble coming to grips with the simple-mindedness of assuming that beneficial or deleterious mutations have any necessary effect at on on fecundity.

My family hosts a number of odd and relatively undesirable alleles. We tend to have extra sets of front teeth -- which requires surgery to prevent blocking permanent teeth. My son has Adam's missing rib, something that gets painful when playing racket sports. Most of us are nearsighted.

What we don't have are genes leading to early death or to belief in cult science. And based on anecdotal evidence, the inability to understand cumulative selection is positively correlated withing having nine children.
Posted by: Zachriel on June 16 2009,19:38

Quote (Steve Schaffner @ June 16 2009,18:54)
 
Quote (Zachriel @ June 14 2009,08:47)
If the genome is 3e8 bases in size (or any such large number) and there is an average of one mutation per child, then we expect that ~1/3 of the children will *not* have mutations. If each mother produces 6 children, then chances are that each new generation will include many individuals
without mutations. (E.g. mice often have several litters of 4-10 pups.)

---------------------QUOTE-------------------


True, although that's probably not a good model for humans, who have something between 1 and 3 deleterious mutations (probably) per birth, and more likely close to the top end than the bottom of the range. That doesn't mean that the population has to collapse genetically. It just means that in the steady state, everyone is carrying a fair number of deleterious mutations, with those having the most being the least likely to reproduce.
---------------------QUOTE-------------------


It's probably not a good model for mice either. I'm trying a wide variety of parameters to determine the program's limits—or to break it. I've tried very high fecundity, high beneficial mutation rates, and various other extremes. What about additive vs. multiplicative mutational effects?

Quote (AnsgarSeraph @ June 16 2009,19:00)
Here are the files. The first two are fitness graphs:
---------------------QUOTE-------------------


Is that with maximum beneficial effect equal to one versus default? Fraction favorable is set very high.

Consider this. If we use the defaults, then no beneficial mutation becomes fixed, which is contrary to fact. Even if you believe species are ultimately doomed to genetic meltdown, it still means the default parameters or the program itself has a problem.
Posted by: AnsgarSeraph on June 16 2009,19:51

Quote (Zachriel @ June 16 2009,19:38)
Is that with maximum beneficial effect equal to one versus default? Fraction favorable is set very high.

Consider this. If we use the defaults, then no beneficial mutation becomes fixed, which is contrary to fact. Even if you believe species are ultimately doomed to genetic meltdown, it still means the default parameters or the program itself has a problem.
---------------------QUOTE-------------------


The first run (JSW507) has a maximum beneficial effect of 0.01, while the second run (JSW508) is set to the default 0.001. There's only one order of magnitude between the two runs.

Sanford seems to be willing to give on the accuracy of the default parameters for the program . . . which is odd, in that those are the parameters he defends as biologically realistic in his papers. I think there are enough wrong parameters (and possibly a completely wrong coding paradigm) that any couple of parameters could be fixed and the population would still show fitness decline.

At the moment, I'm leaning toward the idea that the program is broken . . . but hopefully we can get Dr. Sanford on TWeb soon enough and have some clear back-n-forth.

—Sam
Posted by: Zachriel on June 16 2009,21:15

Quote (AnsgarSeraph @ June 16 2009,19:51)
At the moment, I'm leaning toward the idea that the program is broken . . . but hopefully we can get Dr. Sanford on TWeb soon enough and have some clear back-n-forth.

—Sam
---------------------QUOTE-------------------


Every time they add a feature it seems to overlap another feature and reduce the effect of selection. "Probability selection" interacts with "Heritability". "Fraction of mutations beneficial" interacts with "maximum beneficial effect".

How does Mendel's Accountant handle "probability selection". If they use Roulette Wheel selection, then it seems to negate the number of offspring setting. Nor does Roulette Wheel selection have a problem maintaining the population as suggested in the manual.

I took a look at the source code, but it doesn't seem formatted for readability.
Posted by: mammuthus on June 17 2009,14:12

The latest update is that Sanford will be engaged in a one on one discussion at TWeb with Steve Schaffner:



---------------------QUOTE-------------------
All right; I just finished responding back to Dr. Sanford. He appears to be willing to engage sfs1 in a discussion on TWeb. He agrees that a one-on-one discussion is best, since he prefers to keep the discussion from being polarized.

I told Dr. Sanford that I would initiate the process of setting up a Basketball Court thread but I'm not sure how to do that; any long-timers here know how that works? We could use this thread as a commentary thread, I think.

This has the potential to be really interesting. Dr. Sanford seems to be very genuine and polite; I think we've got a good chance to sort out some of MENDEL's problems without getting lost in the weeds.

Dr. Sanford permitted me to post our e-mail exchange; the emails are attached. This is the part that concerns MENDEL:



---------------------QUOTE-------------------
As you know, Mendel has enormous user-specificity. It is literally a genetic accounting program, and honestly takes the input parameters which the users chooses, and processes them through the biological mechanics of mutation, selection, meiosis, gamete fusion, and formation of the next generation.

   The default settings are just a starting point for research. If you put in the right parameters you can get extreme evolution. However, we argue that realistic settings always yield degeneration. This can be a point of discussion.

   In regrad to your own experiments, I would like to point out that
   biologists realize that the distribution of good and bad mutations are not symetrical. There are far fewer beneficials, and the range of beneficials is different - it is generally acknowledged that beneficials have a lower mean effect (is is harder to make major improvements in a highly optimized system). If you go to the mutation specifications, you can specify a high maximal beneficial effect - even up to 1.0. The default is .001 - meaning that a maximal beneficial effect is small, increasing fitness by 0.1%. A setting of 1.0 means that a single mutation can double fitness - creating as much biological functionality as the entire rest of the genome. This type of setting has many biological and logical ramifications that require quite a bit of discussion. In an accounting program, a single mega-beneficial can ALWAYS compensate for any amount of genetic damage. But is that realistic?
---------------------QUOTE-------------------




A big thanks to sfs1 for agreeing to dialog with Dr. Sanford when he comes to TWeb. This should be an illuminating discussion and a great addition to the stuff we've already figured out.

—Sam
---------------------QUOTE-------------------



I think you'll have to register at TWeb to view the debate when it comes up online.
Posted by: Zachriel on June 17 2009,15:11

Quote (JohnW @ June 16 2009,15:47)
 
Quote (sledgehammer @ June 16 2009,12:42)
I think it is even worse that that Zach, for three reasons:

<snip>
---------------------QUOTE-------------------


I think you're right, sledgehammer.  It looks like Sanford thinks the beneficial effect of a mutation is constrained by its "size" relative to the total size of the genome - the bigger the genome, the smaller the effect of a single mutation.  

Assuming you and I are not misinterpreting, I think there are two possibilities:
(a) - he really is that dumb (perhaps he thinks giraffes have more neck genes than humans);
(b) - this is yet another silly exercise in apologetics - slosh a lot of sciency talk around, but fix it to make sure you get the answer Jesus wants you to get.

I'm strongly leaning (b).
---------------------QUOTE-------------------


He just said it again.



---------------------QUOTE-------------------
Sanford: A setting of 1.0 means that a single mutation can double fitness - creating as much biological functionality as the entire rest of the genome.
---------------------QUOTE-------------------


A doubling in fitness does not imply a doubling of "biological functionality". Resistance to plague doesn't imply a dramatic increase in the size of a genome. It may just mean that fleas think you smell bad.
Posted by: sledgehammer on June 17 2009,16:39



---------------------QUOTE-------------------
Sanford: A setting of 1.0 means that a single mutation can double fitness - creating as much biological functionality as the entire rest of the genome.
---------------------QUOTE-------------------


I don't buy that argument either.
This is from:
< J. Sanford, J. Baumgardner, W. Brewer, P. Gibson, and W. Remine. Mendel's Accountant: A biologically realistic forward-time population genetics program. SCPE. 8(2), July 2007, pp. 147-165. >
       

---------------------QUOTE-------------------
3.2. Prescribing Fitness Effects of Mutations. ...These scale factors are meaningful relative to the initial fitness value assumed for the population before we introduce new mutations. In Mendel we assume this initial fitness value to be 1.0. For deleterious mutations, since lethal mutations exist, we choose dsf del = -1.
For favorable mutations, we allow the user to specify the (positive) scale factor dsf fav. Normally, this would be a small value (e.g., 0.01 to 0.1), since it is only in very special situations that a single beneficial mutation would
have a very large effect.
---------------------QUOTE-------------------



Seems to me that if the scale factor for deleterious mutations of -1 represents lethality, (i.e. no chance of reproduction), then it's inverse for beneficial mutations, +1, would represent guaranteed reproductive success, not "doubling of fitness".
Posted by: Zachriel on June 17 2009,19:39

Quote (sledgehammer @ June 17 2009,16:39)
   

---------------------QUOTE-------------------
Sanford: A setting of 1.0 means that a single mutation can double fitness - creating as much biological functionality as the entire rest of the genome.
---------------------QUOTE-------------------


I don't buy that argument either.
This is from:
< J. Sanford, J. Baumgardner, W. Brewer, P. Gibson, and W. Remine. Mendel's Accountant: A biologically realistic forward-time population genetics program. SCPE. 8(2), July 2007, pp. 147-165. >
             

---------------------QUOTE-------------------
3.2. Prescribing Fitness Effects of Mutations. ...These scale factors are meaningful relative to the initial fitness value assumed for the population before we introduce new mutations. In Mendel we assume this initial fitness value to be 1.0. For deleterious mutations, since lethal mutations exist, we choose dsf del = -1.
For favorable mutations, we allow the user to specify the (positive) scale factor dsf fav. Normally, this would be a small value (e.g., 0.01 to 0.1), since it is only in very special situations that a single beneficial mutation would
have a very large effect.
---------------------QUOTE-------------------



Seems to me that if the scale factor for deleterious mutations of -1 represents lethality, (i.e. no chance of reproduction), then it's inverse for beneficial mutations, +1, would represent guaranteed reproductive success, not "doubling of fitness".
---------------------QUOTE-------------------


Relative fitness compares different genotypes in a population, and is defined as the ratio of the average numbers contributed to the next generation with one genotype set arbitarily at 1. So if genotype-A contributes 300 and normal genotype-B contributes 200, then genotype-A has a relative fitness of 1.5 compared to genotype-B. Relative fitness can be most any non-negative number.

Absolute fitness is calculated for a single genotype as simply the ratio of the numbers in the new generation to the old after selection. So if the population of the genotype increases from 100 to 200, then it has an absolute fitness of 2. Again, absolute fitness can be most any non-negative number.

I've been trying to independently implement Mendel's Accountant, but keep running into such definitional problems. Heritability. Fitness. And how they're handling probability selection. I'm working with a simplified model, but Mendel's Accountant should be able to handle the simple cases with obvious results.

I'm assuming that if fitness increases by 1, then it goes from 1 to 2 (100% increase), or from 2 to 3 (50%) and so on. It shouldn't be additive, but multiplicative so it scales. Sanford's complaint is that if we use multiplicative, then it can never reach zero. So he is clearly assuming his conclusion.

It's not easy to resolve some of these problems. If we scale fecundity with fitness, then that solves the problem of very low fitness. But introduces a problem if the fitness levels climb so that we may be radically multiplying the reproductive rate.

Of course, "generation" is an abstraction, so it may represent an undefined breeding season. Frankly, the whole thing is an abstraction, so any strong claims about the specifics of biology are invalid anyway.
Posted by: Henry J on June 17 2009,20:19



---------------------QUOTE-------------------
I took a look at the source code, but it doesn't seem formatted for readability.
---------------------QUOTE-------------------



Maybe it evolved?

Henry
Posted by: Zachriel on June 17 2009,20:59

Some bits from Mendel's Accountant source code.

Offspring:

if(fitness_dependent_fertility) then
              fitness_adjusted_offspring = num_offspring*sqrt(post_sel_fitness)


Favorable Mutations:

c...  Compute mean absolute fitness effect for favorable mutations.

     sum = 0.
     d2  = 1.

     do i=1,1000000
        d1 = d2
        d2 = exp(-alpha_fav*(0.000001*i)**gamma_fav)
        sum = sum + d1 + d2
     end do

     fav_mean = 0.0000005*sum*max_fav_fitness_gain


Phenotypic Fitness:

     noise = sqrt(geno_fitness_variance*(1. - heritability) /heritability + non_scaling_noise**2)

c...  Add noise to the fitness to create a phenotypic fitness score...
do i=1,total_offspring
        pheno_fitness(i) = fitness(i) + random_normal()*noise


Unrestricted probability selection:

c...     For unrestricted probability selection, divide the phenotypic  
c...     fitness by a uniformly distributed random number prior to
c...     ranking and truncation.  This procedure allows the probability
c...     of surviving and reproducing in the next generation to be
c...     directly related to phenotypic fitness and also for the correct
c...     number of individuals to be eliminated to maintain a constant
c...     population size.

        do i=1,total_offspring
           work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
        end do


Divide by randomnum as well as add non-heritable noise to the phenotype?
Posted by: Bob O'H on June 18 2009,01:30

Quote (Zachriel @ June 17 2009,19:39)
 
Relative fitness compares different genotypes in a population, and is defined as the ratio of the average numbers contributed to the next generation with one genotype set arbitarily at 1. So if genotype-A contributes 300 and normal genotype-B contributes 200, then genotype-A has a relative fitness of 1.5 compared to genotype-B. Relative fitness can be most any non-negative number.

Absolute fitness is calculated for a single genotype as simply the ratio of the numbers in the new generation to the old after selection. So if the population of the genotype increases from 100 to 200, then it has an absolute fitness of 2. Again, absolute fitness can be most any non-negative number.

---------------------QUOTE-------------------


Yes, this is how it's done.  Personally, I'd prefer it if it was on the log scale: there's all sorts of statistical theory that slots nicely into the evolutionary theory.



---------------------QUOTE-------------------
I've been trying to independently implement Mendel's Accountant, but keep running into such definitional problems. Heritability. Fitness. And how they're handling probability selection. I'm working with a simplified model, but Mendel's Accountant should be able to handle the simple cases with obvious results.
---------------------QUOTE-------------------


My advice: keep away from heritability.  It complicates matters, and is dependent on the genetic variation in the population.  I suspect Sanford et al. don't really understand quantitative genetics: certainly Sanford makes some mistakes because of his lack of understanding in Genetic Entropy.



---------------------QUOTE-------------------
I'm assuming that if fitness increases by 1, then it goes from 1 to 2 (100% increase), or from 2 to 3 (50%) and so on. It shouldn't be additive, but multiplicative so it scales. Sanford's complaint is that if we use multiplicative, then it can never reach zero. So he is clearly assuming his conclusion.
---------------------QUOTE-------------------



Indeed, but it can get arbitrarily close to 0, so it doesn't make any practical difference (unless you're working with continuous populations, when you end up with nano-foxes).



---------------------QUOTE-------------------
It's not easy to resolve some of these problems. If we scale fecundity with fitness, then that solves the problem of very low fitness. But introduces a problem if the fitness levels climb so that we may be radically multiplying the reproductive rate.
---------------------QUOTE-------------------



Don't you just have to invoke density dependence?  I think this Darwin chap had some thoughts along those lines, after he read Malthus.



---------------------QUOTE-------------------
Of course, "generation" is an abstraction, so it may represent an undefined breeding season. Frankly, the whole thing is an abstraction, so any strong claims about the specifics of biology are invalid anyway.
---------------------QUOTE-------------------


Depends on what species you're working on.  For things like butterflies, it's fine.  And, to be honest, the purpose of Mendel's Accountant is to make general statements about evolution, so this stuff is OK, as long as it's clear what the assumptions are.  A lot of the assumptions shouldn't have too big an effect on the robustness of the claims.
Posted by: Bob O'H on June 18 2009,01:58

Quote (Zachriel @ June 17 2009,20:59)
Some bits from Mendel's Accountant source code.

Offspring:

if(fitness_dependent_fertility) then
              fitness_adjusted_offspring = num_offspring*sqrt(post_sel_fitness)


---------------------QUOTE-------------------


Any idea why the square root of post_sel_fitness?

 

---------------------QUOTE-------------------
Favorable Mutations:

c...  Compute mean absolute fitness effect for favorable mutations.

     sum = 0.
     d2  = 1.

     do i=1,1000000
        d1 = d2
        d2 = exp(-alpha_fav*(0.000001*i)**gamma_fav)
        sum = sum + d1 + d2
     end do

     fav_mean = 0.0000005*sum*max_fav_fitness_gain

---------------------QUOTE-------------------


Ugh.  That's a horrible way to do the integration.  I recognise the density (George Box was promoting it in the 50s), and it has an analytic solution: alpha_fav*gamma_fav*Gamma(1/gamma_fav), where Gamma() is the gamma function.

(ref: Box, G. E. P. 1953. A note on regions for tests of kurtosis. Biometrika 40: 465-468)

Also, where does the 0.0000005 come from?  I'm always suspicious of constants like that.

 

---------------------QUOTE-------------------
Phenotypic Fitness:

     noise = sqrt(geno_fitness_variance*(1. - heritability) /heritability + non_scaling_noise**2)

c...  Add noise to the fitness to create a phenotypic fitness score...
do i=1,total_offspring
        pheno_fitness(i) = fitness(i) + random_normal()*noise

---------------------QUOTE-------------------



The random_normal()*noise[/color] is environmental variation, and we would typically set it to be constant.  Because MA defines heritability, they have to back-calculate the environmental variance: that's what geno_fitness_variance*(1. - heritability) /heritability is doing.  Except it's wrong, because the non_scaling_noise is added too, so the heritability isn't a heritability.

Also, note what this back-calculation means: it scales the environmental variance to the genetic variance, so as genetic variation decreases, the environment becomes more stable.  This is bollocks.

They should really set the environmental variance.  The problem is getting an easily understood scaling, but that can be done by scaling to the mean fitness effects.  

Also, check through: the calculations look OK for log(fitness): they range from -infinity to infinity and are additive.  But on what scale does pheno_fitness affect?  On this scale, or the exponential scale?

 

---------------------QUOTE-------------------
Unrestricted probability selection:

c...     For unrestricted probability selection, divide the phenotypic  
c...     fitness by a uniformly distributed random number prior to
c...     ranking and truncation.  This procedure allows the probability
c...     of surviving and reproducing in the next generation to be
c...     directly related to phenotypic fitness and also for the correct
c...     number of individuals to be eliminated to maintain a constant
c...     population size.

        do i=1,total_offspring
           work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
        end do


Divide by randomnum as well as add non-heritable noise to the phenotype?
---------------------QUOTE-------------------


:-)  No idea what's going on.
Posted by: Zachriel on June 18 2009,07:02

Quote (Bob O'H @ June 18 2009,01:30)
   
Quote (Zachriel @ June 17 2009,19:39)
 I've been trying to independently implement Mendel's Accountant, but keep running into such definitional problems. Heritability. Fitness. And how they're handling probability selection. I'm working with a simplified model, but Mendel's Accountant should be able to handle the simple cases with obvious results.
---------------------QUOTE-------------------


My advice: keep away from heritability.  It complicates matters, and is dependent on the genetic variation in the population.  I suspect Sanford et al. don't really understand quantitative genetics: certainly Sanford makes some mistakes because of his lack of understanding in Genetic Entropy.
---------------------QUOTE-------------------


I've been setting heritability to 1 when running tests, so I should be able to reach comparative results. I think a working heritability parameter could be included in a Mendel's Accountant.  But the concept is sometimes counterintuitive.

Quote (Bob O'H @ June 18 2009,01:30)
 
Quote (Zachriel @ June 17 2009,19:39)
 
Of course, "generation" is an abstraction, so it may represent an undefined breeding season. Frankly, the whole thing is an abstraction, so any strong claims about the specifics of biology are invalid anyway.
---------------------QUOTE-------------------


Depends on what species you're working on.  For things like butterflies, it's fine.  And, to be honest, the purpose of Mendel's Accountant is to make general statements about evolution, so this stuff is OK, as long as it's clear what the assumptions are.  A lot of the assumptions shouldn't have too big an effect on the robustness of the claims.
---------------------QUOTE-------------------


I think you could make claims about particular models of evolution, but you would have to be very careful about a bold claim that evolution is falsified based on such an abstraction. Certainly some carefully devised claims can be justified.

Quote (Bob O'H @ June 18 2009,01:58)
 
Quote (Zachriel @ June 17 2009,20:59)
Some bits from Mendel's Accountant source code.

Offspring:

if(fitness_dependent_fertility) then
              fitness_adjusted_offspring = num_offspring*sqrt(post_sel_fitness)


---------------------QUOTE-------------------


Any idea why the square root of post_sel_fitness?
---------------------QUOTE-------------------


I'm guessing it's arbitrary—to keep the reproductive rate from running away.

Quote (Bob O'H @ June 18 2009,01:58)
Also, where does the 0.0000005 come from?  I'm always suspicious of constants like that.
---------------------QUOTE-------------------


I was kinda hoping you knew. It may be an artifact of the calculation, but why that particular number? I agree that having a constant imbedded in the code without a notation is rather odd. Even if it is a standard equation, it's still best to notate it.

Quote (Bob O'H @ June 18 2009,01:58)
   
Quote (Zachriel @ June 17 2009,20:59)
Phenotypic Fitness:

    noise = sqrt(geno_fitness_variance*(1. - heritability) /heritability + non_scaling_noise**2)

c...  Add noise to the fitness to create a phenotypic fitness score...
do i=1,total_offspring
        pheno_fitness(i) = fitness(i) + random_normal()*noise

---------------------QUOTE-------------------



The random_normal()*noise[/color] is environmental variation, and we would typically set it to be constant.  Because MA defines heritability, they have to back-calculate the environmental variance: that's what geno_fitness_variance*(1. - heritability) /heritability is doing.  Except it's wrong, because the non_scaling_noise is added too, so the heritability isn't a heritability.
---------------------QUOTE-------------------


Non-scaling noise is set to 0 in the parameters, so we can ignore that for now. The problem of environmental variance scaling with genetic variance is a serious problem. Who could have guessed that the dinosaurs could have avoided planetary oblivion if they just weren't so darned diverse.
Posted by: Zachriel on June 18 2009,08:00

Quote (Zachriel @ June 18 2009,07:02)
Quote (Bob O'H @ June 18 2009,01:30)
   
Quote (Zachriel @ June 17 2009,19:39)
 I've been trying to independently implement Mendel's Accountant, but keep running into such definitional problems. Heritability. Fitness. And how they're handling probability selection. I'm working with a simplified model, but Mendel's Accountant should be able to handle the simple cases with obvious results.
---------------------QUOTE-------------------


My advice: keep away from heritability.  It complicates matters, and is dependent on the genetic variation in the population.  I suspect Sanford et al. don't really understand quantitative genetics: certainly Sanford makes some mistakes because of his lack of understanding in Genetic Entropy.
---------------------QUOTE-------------------


I've been setting heritability to 1 when running tests, so I should be able to reach comparative results. I think a working heritability parameter could be included in a Mendel's Accountant.  But the concept is sometimes counterintuitive.
---------------------QUOTE-------------------


We only need to be concerned with ranking, not absolute phenotypic fitness, when simulating heritability. So I've been normalizing fitness (which preserves ranking), then applying the specified noise. For a child population of 25 and heritability of 50%, I get rankings like this. The number is genotypic rank, the position is phenotypic rank.

3
5
1
8
6
2
9
7
12
15
4
13
11
10
22
19
16
14
18
20
21
24
23
17
25
Posted by: sledgehammer on June 18 2009,11:05

Quote (Bob O'H @ June 17 2009,23:58)
   

---------------------QUOTE-------------------
Favorable Mutations:

c...  Compute mean absolute fitness effect for favorable mutations.

     sum = 0.
     d2  = 1.

     do i=1,1000000
        d1 = d2
        d2 = exp(-alpha_fav*(0.000001*i)**gamma_fav)
        sum = sum + d1 + d2
     end do

     fav_mean = 0.0000005*sum*max_fav_fitness_gain

---------------------QUOTE-------------------


Ugh.  That's a horrible way to do the integration.  I recognise the density (George Box was promoting it in the 50s), and it has an analytic solution: alpha_fav*gamma_fav*Gamma(1/gamma_fav), where Gamma() is the gamma function.

(ref: Box, G. E. P. 1953. A note on regions for tests of kurtosis. Biometrika 40: 465-468)

Also, where does the 0.0000005 come from?  I'm always suspicious of constants like that.
---------------------QUOTE-------------------


Well it is 1/2*1/1,000,000, and since it is a density, the sum-as-integral needs to be scaled, but I thought the (0.000001*i) was supposed to do that.
 I think it's a boo-boo.
Posted by: Zachriel on June 18 2009,13:13

Nevermind.
Posted by: Dr.GH on June 18 2009,14:28

Quote (Zachriel @ June 18 2009,11:13)
Nevermind.
---------------------QUOTE-------------------


< This could answer every thing. >
Posted by: k.e.. on June 18 2009,14:37

Quote (Dr.GH @ June 18 2009,22:28)
Quote (Zachriel @ June 18 2009,11:13)
Nevermind.
---------------------QUOTE-------------------


< This could answer every thing. >
---------------------QUOTE-------------------


phhhht

You say statistics I say stochastics.

Mathmaticians provide the sheet music, I just want the finger positions and the beat.

Actually screw taht, I think I'll just bit torrent the whole damn lot.

If those turkeys can design a better mouse tarp(sic) using that horse and buggy with square wheels then all our moons are their bases.

It aint going to happen, not now not eva.

Look out for the patent. It will be up there with zero wavelenght quantum antigravity decelarators.
Selling webverts and elephnat(sic) repellant is about all they can do.
Posted by: Steve Schaffner on June 18 2009,16:59

Quote (Bob O'H @ June 18 2009,01:30)

My advice: keep away from heritability.  It complicates matters, and is dependent on the genetic variation in the population.  I suspect Sanford et al. don't really understand quantitative genetics: certainly Sanford makes some mistakes because of his lack of understanding in Genetic Entropy.

---------------------QUOTE-------------------


Modeling is easier if you simply work with the selective advantage of the genotype, rather than the selection coefficient for a partly heritable trait. Here the partly heritable trait is fitness itself, which makes my head hurt.
   

---------------------QUOTE-------------------

Indeed, but it can get arbitrarily close to 0, so it doesn't make any practical difference (unless you're working with continuous populations, when you end up with nano-foxes).

---------------------QUOTE-------------------


What kind of fitness are we talking about here, though? Since MA keeps the population constant, it is implicitly using relative fitness. In that case, introducing an arbitrary scaling factor into the fitness doesn't matter; it's only the ratio of fitnesses that matters. It seems to me that the model treats fitness as being relative until it get very small, at which point it is treated as absolute. But there is no simple way to determine absolute fitness from relative fitness.

This seems like a basic point, but I don't understand what the program is trying to model here.
Posted by: sledgehammer on June 18 2009,17:36

Quote (Steve Schaffner @ June 18 2009,14:59)
Since MA keeps the population constant, it is implicitly using relative fitness. In that case, introducing an arbitrary scaling factor into the fitness doesn't matter; it's only the ratio of fitnesses that matters. It seems to me that the model treats fitness as being relative until it get very small, at which point it is treated as absolute. But there is no simple way to determine absolute fitness from relative fitness.

This seems like a basic point, but I don't understand what the program is trying to model here.
---------------------QUOTE-------------------


[Cynic]
I think the program is trying to model nothing more or less than the complete, abject failure of Darwinian Evolution to produce anything other than Genetic Entropy leading inevitably to Mutational Meltdown, unless the genome is infused with Complex Specified Information from a divine source.
[/Cynic]
Posted by: deadman_932 on June 18 2009,18:01

Quote (sledgehammer @ June 18 2009,17:36)
Quote (Steve Schaffner @ June 18 2009,14:59)
Since MA keeps the population constant, it is implicitly using relative fitness. In that case, introducing an arbitrary scaling factor into the fitness doesn't matter; it's only the ratio of fitnesses that matters. It seems to me that the model treats fitness as being relative until it get very small, at which point it is treated as absolute. But there is no simple way to determine absolute fitness from relative fitness.

This seems like a basic point, but I don't understand what the program is trying to model here.
---------------------QUOTE-------------------


[Cynic]
I think the program is trying to model nothing more or less than the complete, abject failure of Darwinian Evolution to produce anything other than Genetic Entropy leading inevitably to Mutational Meltdown, unless the genome is infused with Complex Specified Information from a divine source.
[/Cynic]
---------------------QUOTE-------------------


The answer is quite clear, materialist chance-worshipping Darweenieans:

Posted by: Zachriel on June 18 2009,18:58

Quote (Bob O'H @ June 18 2009,01:58)
   

---------------------QUOTE-------------------
Unrestricted probability selection:

c...     For unrestricted probability selection, divide the phenotypic  
c...     fitness by a uniformly distributed random number prior to
c...     ranking and truncation.  This procedure allows the probability
c...     of surviving and reproducing in the next generation to be
c...     directly related to phenotypic fitness and also for the correct
c...     number of individuals to be eliminated to maintain a constant
c...     population size.

        do i=1,total_offspring
           work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
        end do


Divide by randomnum as well as add non-heritable noise to the phenotype?
---------------------QUOTE-------------------


:-)  No idea what's going on.
---------------------QUOTE-------------------


Basically, he is applying reductions in heritability twice. The heritability function itself, and then this random procedure for selecting reproductive winners by re-ranking them before truncation and passing to the next generation. We could modify the divisor to some function(randomnum) and adjust the degree and type of randomness for picking winners {something like randomnum^N}. It's just another way of introducing random factors into the choice of winners and losers which should already have been accounted for in the heritability function.

The net result is a significant reduction in the effect of selection.
Posted by: Steve Schaffner on June 18 2009,20:53



---------------------QUOTE-------------------

Basically, he is applying reductions in heritability twice. The heritability function itself, and then this random procedure for selecting reproductive winners by re-ranking them before truncation and passing to the next generation. We could modify the divisor to some function(randomnum) and adjust the degree and type of randomness for picking winners {something like randomnum^N}. It's just another way of introducing random factors into the choice of winners and losers which should already have been accounted for in the heritability function.

The net result is a significant reduction in the effect of selection.
---------------------QUOTE-------------------


It would make more sense if it were described in terms of some other phenotype with an effect on fitness. The phenotype has a genetic component and an environmental (or random) component, i.e. has a heritability. The phenotype then confers a fitness, which is the probability of successful reproduction. The number of successful offspring is also drawn from a random distribution, which is what's being done in this bit of code (I guess treated as a binomial distribution).

As a model of selection that seems reasonable (apart from the way the noise scales), but expressing it in terms of the heritability of fitness I find hard to understand -- fitness isn't a phenotype, it's a measure of the success of a phenotype. And the whole thing is pretty convoluted, when the essence of the model could be captured simply by assigning a fitness to the genotype and then calculating the number of offspring. This is a model of evolution written by a breeder rather than by a population geneticist, I would say.
Posted by: Zachriel on June 18 2009,21:42

Quote (Steve Schaffner @ June 18 2009,20:53)


---------------------QUOTE-------------------

Basically, he is applying reductions in heritability twice. The heritability function itself, and then this random procedure for selecting reproductive winners by re-ranking them before truncation and passing to the next generation. We could modify the divisor to some function(randomnum) and adjust the degree and type of randomness for picking winners {something like randomnum^N}. It's just another way of introducing random factors into the choice of winners and losers which should already have been accounted for in the heritability function.

The net result is a significant reduction in the effect of selection.
---------------------QUOTE-------------------


It would make more sense if it were described in terms of some other phenotype with an effect on fitness. The phenotype has a genetic component and an environmental (or random) component, i.e. has a heritability. The phenotype then confers a fitness, which is the probability of successful reproduction. The number of successful offspring is also drawn from a random distribution, which is what's being done in this bit of code (I guess treated as a binomial distribution).
---------------------QUOTE-------------------


Yes, I'm okay with that. That's how my own (rather primitive) model is structured. I'm not modeling recombination at this point, because I don't think that's where the problem lies. It's more basic than that. There's seems to be very little signal of selection. I'm still tinkering, but his results don't seem to jive.
Posted by: midwifetoad on June 19 2009,00:40

< http://www.youtube.com/watch?v=SzEvcS01Cl0 >
Posted by: Zachriel on June 19 2009,07:23

Phenotypic Fitness:

   noise = sqrt(geno_fitness_variance*(1. - heritability) /heritability + non_scaling_noise**2)

c...  Add noise to the fitness to create a phenotypic fitness score...
do i=1,total_offspring
       pheno_fitness(i) = fitness(i) + random_normal()*noise


Leaving aside the non-scaling noise which defaults to zero. And leaving aside the scaling problem which can be overcome with normalization.

c...     For unrestricted probability selection, divide the phenotypic  
c...     fitness by a uniformly distributed random number prior to
c...     ranking and truncation.  This procedure allows the probability
c...     of surviving and reproducing in the next generation to be
c...     directly related to phenotypic fitness and also for the correct
c...     number of individuals to be eliminated to maintain a constant
c...     population size.

       do i=1,total_offspring
          work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
       end do


The first random factor represents non-heritable influences on phenotype, the second representing chance of successful reproduction. Is there any processing between these two functions that we can't simply reduced to (ignoring the infinitesimal):

pheno_fitness(i) = (fitness(i) + random_normal()*noise)    /  randomnum(1)?

Working fitness being just a copy of phenotypic fitness that is only used for ranking before truncation.

c...  Copy the phenotypic fitnesses into array work_fitness.
Posted by: Bob O'H on June 19 2009,12:02

Quote (Steve Schaffner @ June 18 2009,20:53)

It would make more sense if it were described in terms of some other phenotype with an effect on fitness. The phenotype has a genetic component and an environmental (or random) component, i.e. has a heritability. The phenotype then confers a fitness, which is the probability of successful reproduction. The number of successful offspring is also drawn from a random distribution, which is what's being done in this bit of code (I guess treated as a binomial distribution).
---------------------QUOTE-------------------


The problem with describing the model in terms of a phenotype is that you then have to map the effect of the phenotype onto fitness.  It's easier just to leap in and model fitness directly.  Everything works fine if you define fitness as proportional to the expected number of offspring produced by an individual.

I haven't looked at the code, but if we assuming a constant population size and discrete generations, then (ignoring recombination and mutation), the way to model this is to assume that each parent has a fitness si.  The offspring are then drawn from a multinomial distribution with probability for the ith parent being

si/sum(si)

(this would reduce to a binomial distribution if there were only 2 parents).  The multinomial sampling is genetic drift.

You can treat log(si) as you would any standard trait: it's additive, so you can add the genetic and environmental effects.



---------------------QUOTE-------------------
As a model of selection that seems reasonable (apart from the way the noise scales), but expressing it in terms of the heritability of fitness I find hard to understand -- fitness isn't a phenotype, it's a measure of the success of a phenotype. And the whole thing is pretty convoluted, when the essence of the model could be captured simply by assigning a fitness to the genotype and then calculating the number of offspring. This is a model of evolution written by a breeder rather than by a population geneticist, I would say.
---------------------QUOTE-------------------


Not even a breeder: I would expect them to understand heritability a bit better than this!
Posted by: Dr.GH on June 19 2009,12:50

Reading the discussion has made me reconsider Wesley's earlier suggestion that you guys build a better program. It might be a way of collecting your observations.

However, if "Mendle's Accountant Cooked the Books," Sanford et al will merely claim that your new program has predetermined its answer.
Posted by: Steve Schaffner on June 19 2009,13:05

Quote (Bob O'H @ June 19 2009,12:02)
I haven't looked at the code, but if we assuming a constant population size and discrete generations, then (ignoring recombination and mutation), the way to model this is to assume that each parent has a fitness si.  The offspring are then drawn from a multinomial distribution with probability for the ith parent being

si/sum(si)

(this would reduce to a binomial distribution if there were only 2 parents).  The multinomial sampling is genetic drift.

---------------------QUOTE-------------------


Yes, I've written a program for that kind of model, except that I imposed selection in the differential survival of the offspring. Including options for truncation selection and a few other things, it amounted to all of 158 lines, including comments, white space and the multinomial routine.



---------------------QUOTE-------------------

You can treat log(si) as you would any standard trait: it's additive, so you can add the genetic and environmental effects.

---------------------QUOTE-------------------


I haven't thought about this before. Does adding environmental effects do anything more than reduce the effective selection coefficient?
Posted by: slpage on June 19 2009,13:09

Quote (Dr.GH @ June 19 2009,12:50)
Reading the discussion has made me reconsider Wesley's earlier suggestion that you guys build a better program. It might be a way of collecting your observations.

However, if "Mendle's Accountant Cooked the Books," Sanford et al will merely claim that your new program has predetermined its answer.
---------------------QUOTE-------------------


And from what I have seen, he would then just be projecting.

I am still wondering why they think that constraining the outcomes to a constant population size is biolgically realistic.
Posted by: Steve Schaffner on June 19 2009,13:33

Quote (slpage @ June 19 2009,13:09)
I am still wondering why they think that constraining the outcomes to a constant population size is biolgically realistic.
---------------------QUOTE-------------------


It's a feature of many population genetics models. It has the advantage of being simple. How accurate it is depends a lot on what organism you're looking at.

Of course, there's a big difference between using models to analyze how particular aspects of evolution work and trying to model the entire process well enough to say whether it can occur.
Posted by: Henry J on June 19 2009,16:16



---------------------QUOTE-------------------
Of course, there's a big difference between using models to analyze how particular aspects of evolution work and trying to model the entire process well enough to say whether it can occur.
---------------------QUOTE-------------------


Well, that's just being picky! :p

But yeah, if populate size is allowed to go way up, then the simulation would have to deal with food shortages (i.e., fitness would drop across the board if population gets too large for the food supply).

Henry
Posted by: Bob O'H on June 19 2009,16:34

Quote (Steve Schaffner @ June 19 2009,13:05)

 

---------------------QUOTE-------------------

You can treat log(si) as you would any standard trait: it's additive, so you can add the genetic and environmental effects.

---------------------QUOTE-------------------


I haven't thought about this before. Does adding environmental effects do anything more than reduce the effective selection coefficient?
---------------------QUOTE-------------------


No it doesn't.    There might be some subtleties in the precise effects, but I wouldn't worry about them.
Posted by: Bob O'H on June 19 2009,16:42

Quote (Henry J @ June 19 2009,16:16)
 

---------------------QUOTE-------------------
Of course, there's a big difference between using models to analyze how particular aspects of evolution work and trying to model the entire process well enough to say whether it can occur.
---------------------QUOTE-------------------


Well, that's just being picky! :p

But yeah, if populate size is allowed to go way up, then the simulation would have to deal with food shortages (i.e., fitness would drop across the board if population gets too large for the food supply).

Henry
---------------------QUOTE-------------------


That's not too difficult to implement.  You scale the fitness so that it is above 1 at low density, and then apply some density dependence to it, which affects all genotypes equally.  Then you draw the number of offspring from a Poisson distribution with that mean.

i.e. if fitness is si, and the population size is N, then the number of offspring is Poisson distributed with mean si exp(-N/K), for example (exp(-N/K) is the discount for density).

I think doing it this way is nicer because (a) it's easier to programme (there's no independence problems, or multinomials with huge N's), and (b) you're modelling absolute fitness, so it can go below 1, and you might get extinctions.
Posted by: Zachriel on June 19 2009,17:34

Quote (Steve Schaffner @ June 18 2009,20:53)
<snipped>

And the whole thing is pretty convoluted, when the essence of the model could be captured simply by assigning a fitness to the genotype and then calculating the number of offspring.
---------------------QUOTE-------------------


From much of this discussion, it's easy to see how many different ways there are to abstract an evolutionary process.

These are the primary attributes I've found in Mendel's Accountant:

* Population of Genotypes (genotypic fitness).
* Genotype modified by heritability and noise to Phenotype (phenotypic fitness).
* Genotype further modified for chance of reproductive success to Working Fitness.
* Number of offspring proportional to sqrt(Phenotype).
* Reproduction with mutation.
* Throw in more random factors, such as random death.

> The calculation of Phenotype is not scaled.
> The calculation of Working Fitness is division by Randomnum. Not the sqrt(randomnum), not some other exponent. Or even a normal binomial to determine reproductive success.
> Number of offspring proportional to sqrt(Phenotype). Why the square root? Why not some other exponent?
> Random death? Isn't that already accounted for in phenotypic fitness (for stillbirths) or chance of reproductive success?

You may as well throw in another factor that randomizes falling off a cliff and whether she has a headache, and just make up numbers or exponents or parameters that seem right. The problem is the qualitative nature of the simulation and the arbitrariness of some of the assumptions. I just don't see Mendel's Accountant being salvageable as a quantitative model of biology.
Posted by: Bob O'H on June 20 2009,02:14

Quote (Zachriel @ June 19 2009,17:34)
 
Quote (Steve Schaffner @ June 18 2009,20:53)
<snipped>

And the whole thing is pretty convoluted, when the essence of the model could be captured simply by assigning a fitness to the genotype and then calculating the number of offspring.
---------------------QUOTE-------------------


From much of this discussion, it's easy to see how many different ways there are to abstract an evolutionary process.

These are the primary attributes I've found in Mendel's Accountant:

* Population of Genotypes (genotypic fitness).
* Genotype modified by heritability and noise to Phenotype (phenotypic fitness).
* Genotype further modified for chance of reproductive success to Working Fitness.
* Number of offspring proportional to sqrt(Phenotype).
* Reproduction with mutation.
* Throw in more random factors, such as random death.
---------------------QUOTE-------------------



No recombination?  We know about mutational meltdown, and we also know that sex can mitigate the effect.



---------------------QUOTE-------------------
You may as well throw in another factor that randomizes falling off a cliff and whether she has a headache,
---------------------QUOTE-------------------


Don't ask Gil to help with the programming for that.
Posted by: Zachriel on June 20 2009,08:21

Quote (Bob O'H @ June 20 2009,02:14)
   
Quote (Zachriel @ June 19 2009,17:34)
       
Quote (Steve Schaffner @ June 18 2009,20:53)
<snipped>

And the whole thing is pretty convoluted, when the essence of the model could be captured simply by assigning a fitness to the genotype and then calculating the number of offspring.
---------------------QUOTE-------------------


From much of this discussion, it's easy to see how many different ways there are to abstract an evolutionary process.

These are the primary attributes I've found in Mendel's Accountant:

* Population of Genotypes (genotypic fitness).
* Genotype modified by heritability and noise to Phenotype (phenotypic fitness).
* Genotype further modified for chance of reproductive success to Working Fitness.
* Number of offspring proportional to sqrt(Phenotype).
* Reproduction with mutation.
* Throw in more random factors, such as random death.
---------------------QUOTE-------------------



No recombination?  We know about mutational meltdown, and we also know that sex can mitigate the effect.
---------------------QUOTE-------------------


Oops. Good point. Haven't got that far in reconstructing the algorithm, but of course. Add it to the list of primary attributes.  

c...        Randomly mate one half of the population with members
c...        from the other half.

           dad = min(current_pop_size,
    &            1 + int(current_pop_size*randomnum(1)))

           do while(.not.available(dad))
              dad = mod(dad, current_pop_size) + 1
           end do
           available(dad) = .false.

           mom = min(current_pop_size,
    &            1 + int(current_pop_size*randomnum(1)))
           do while(.not.available(mom))
              mom = mod(mom, current_pop_size) + 1
           end do
           available(mom) = .false.


Looks like asexual recombination, i.e. two random individuals.

                 if(randomnum(1) < 0.5) then
                    parent = dad
                 else
                    parent = mom
                 end if


This looks like a bit of the actual recombination event:

if(.not. clonal_reproduction) then

<... snip ...>

do ch=1,haploid_chromosome_number

        ls0 = (ch - 1)*chr_length + 1
        ls1 = min(chr_length-1, int(chr_length*randomnum(1))) + ls0
        ls2 = min(chr_length-1, int(chr_length*randomnum(1))) + ls0

<... snip ...>

        if(dynamic_linkage)
    &      hap_id = min(2, 1 + int(2.*randomnum(1)))


The rest of the code seems a lot more complicated than need be, but that might just be because of the dynamic linkage option. The command do ch=1,haploid_chromosome_number appears twice in the code. (Indentation doesn't seem to be consistent so it's hard to know where the beginning and ending of sections are to be found, though that might be my reader.)

I still think the problem with Mendel's Accountant is more basic, and breaks before that point.
Posted by: Zachriel on June 20 2009,08:35

Quote (Bob O'H @ June 20 2009,02:14)
 

---------------------QUOTE-------------------
You may as well throw in another factor that randomizes falling off a cliff and whether she has a headache,
---------------------QUOTE-------------------


Don't ask Gil to help with the programming for that.
---------------------QUOTE-------------------


Why not? He helped me with Word Mutagenator.



---------------------QUOTE-------------------
< GilDodgen >: This is not hard: Simply print out the dictionary contained in the Zachriel program and be done with it. No search is necessary, because all the requisite information has been supplied by the programmer in advance.

---------------------QUOTE-------------------


GilDodgen has never quite grasped the concept of a model. (In Word Mutagenation, the evolving sequences have no knowledge of the word-scape. They either survive or die. It was an ID challenge, by the way, not my idea.) They never did let me respond on their blog.
Posted by: k.e.. on June 20 2009,12:33

ok here goes, if I'm off target correct me gently.

I know very little about the ins and outs of the GA problem space but I do understand modeling as a concept.

It seems to me that the simple weasel GA resultant graph over time IS the fitness landscape.


So to test the fitness landscape several things could be tried and they probably already have been so again I'm just an amature.

A couple of simple additions could be to add a second sentence that competes for only a few common letters of the alphabet that are in the first sentence ...and see what happens.

Only a limited number of letters are available for all species of sentence perhaps based on the usage count in English.

Then add an in vivo preditor which eats the letters from one of the first species but leaves a few letters on the savanah or petrie dish.

Then from the leftovers see if a new species of gramatically correct (dare I say Shakesperean quotation) arises.

Compare the fitness landscapes with known population statistics for a simple analog and reduce errors.
Posted by: Henry J on June 20 2009,17:54



---------------------QUOTE-------------------
whether she has a headache
---------------------QUOTE-------------------



Is there a variable for the amount of aspirin that's available?

Henry
Posted by: dvunkannon on June 21 2009,22:26

From the < MEDAL (Missouri Estimation of Distribution Algorithms Laboratory > blog



---------------------QUOTE-------------------
John H. Holland will give a keynote speech at GECCO-2009 on July 12, 2009 (Sunday), 10:40am-11:40am. The talk is entitled Genetic Algorithms: Long Ago [Past] and Far Away [Future] and the abstract of the talk follows:

It was in the mid-50’s of the 20th century when I realized that Fisher’s fundamental theorem could be extended from individual alleles to co-adapted sets of alleles, without linearization. That led to a realization that recombination, rather than mutation, was the main mechanism providing grist for the natural selection mill. There was little theory concerning recombination in those days, but now recombination is a standard explanation for biological innovations, such as swine flu.

Much later, in the early 1990’s, GA’s provided the “adaptive” part of rule-based models of complex adaptive systems (CAS), such as the artificial stock market pioneered at the Santa Fe Institute. Tag-based signal processing occurs in systems as different as biological cells, language acquisition, and ecosystems. CAS models offer a unified way to study the on-going co-evolution of boundary and tag networks in these systems

---------------------QUOTE-------------------


Posted by: slpage on June 23 2009,19:10

Quote (Steve Schaffner @ June 19 2009,13:33)
Quote (slpage @ June 19 2009,13:09)
I am still wondering why they think that constraining the outcomes to a constant population size is biolgically realistic.
---------------------QUOTE-------------------


It's a feature of many population genetics models. It has the advantage of being simple. How accurate it is depends a lot on what organism you're looking at.

Of course, there's a big difference between using models to analyze how particular aspects of evolution work and trying to model the entire process well enough to say whether it can occur.
---------------------QUOTE-------------------


Which is sort of my point.  Their claim is that this is state of the art and realsitic, yet they appear to have employed certain parameters for simplicity rather than realism.

Constant population size is one of my pet peeves with Haldane's model as well.
Posted by: slpage on June 23 2009,19:13

Quote (Henry J @ June 19 2009,16:16)


---------------------QUOTE-------------------
Of course, there's a big difference between using models to analyze how particular aspects of evolution work and trying to model the entire process well enough to say whether it can occur.
---------------------QUOTE-------------------


Well, that's just being picky! :p

But yeah, if populate size is allowed to go way up, then the simulation would have to deal with food shortages (i.e., fitness would drop across the board if population gets too large for the food supply).

Henry
---------------------QUOTE-------------------


I don't mean just allowing it to grow willy nilly - that is not realistic, either.  But if they want to claim 'most realistic' then it seems to me employing non-universal constraints negates that claim.

I also gather that while deleterious mutations are allowed to accumulate and not reduce, that beneficials are allowed to be lost.  Is that correct?
Posted by: Steve Schaffner on June 23 2009,19:44

Quote (slpage @ June 23 2009,19:13)
I don't mean just allowing it to grow willy nilly - that is not realistic, either.  But if they want to claim 'most realistic' then it seems to me employing non-universal constraints negates that claim.

---------------------QUOTE-------------------


True, but I doubt variation in population size would have much effect on the long-term fate of the population.


---------------------QUOTE-------------------

I also gather that while deleterious mutations are allowed to accumulate and not reduce, that beneficials are allowed to be lost.  Is that correct?
---------------------QUOTE-------------------


I haven't been running the program, but I haven't seen anything obviously wrong with how they handle beneficial and deleterious mutations. In the real world, deleterious mutations of very small effect really do accumulate, and most beneficial mutations really are lost.
Posted by: AnsgarSeraph on June 24 2009,21:24

I've been in touch with Dr. Wes Brewer, who maintains the MENDEL code. He's still working on the Linux flavors and hopes to have a production copy soon. He's not comfortable posting it on a public site, at the moment, but said that I could privately supply the link to people here who want it.

The "big deal" aspect of it is that this is version 1.4.5. It looks like the source code is packaged in "cgi-bin/cmendel" and "cgi-bin/fmendel" (but that's just from the README . . . I don't know code so I didn't open any files).

Anyone who wants this thing can email me @

whitebriar - gmail - com

—Sam
Posted by: oldmanintheskydidntdoit on June 25 2009,03:08

Quote (AnsgarSeraph @ June 24 2009,21:24)
I've been in touch with Dr. Wes Brewer, who maintains the MENDEL code. He's still working on the Linux flavors and hopes to have a production copy soon.
---------------------QUOTE-------------------


Mah, a java version would be better IMHO - install apache etc ????

Then it would work on Linux, Windows, MacOS anything that has a JVM in fact.
Posted by: Zachriel on June 25 2009,08:41

The calculation of "working fitness" is seemingly broke. From Mendel's Accountant:

      do i=1,total_offspring
          work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
       end do


We can test this by taking a series of fitnesses k from 1.001 to 2,

 For k = 1 To 1000
      Cells(k, "a") = 1+ k / 1000
      Cells(k, "b") = Cells(k, "a") / Rnd
  Next k


This is a typical result:

9 Average
31 St.Dev.
362% Relative St.Dev.
1.04 Min
533 Max

The original distribution of k has a Relative St.Dev. of 19%. It's worse for fitnesses distributed between 0.5 and 1.5 or 0.5 and 1. (Just like the phylogenetic fitness, the calculation is not normalized. And why ÷Rnd^1? Why not ÷Rnd^½ or ÷Rnd^¾?) This single operation eliminates the vast majority of the signal from genetic or phylogenetic fitness.

A more reasonable calculation is < Roulette Wheel selection >.

I have a working version of Gregor's Bookkeeper. I'll post on that in the next few rotations.


-
Fixed a problem.
Posted by: Zachriel on June 25 2009,10:03

Ranking is the actual measure, not the nominal working fitness. If we just look at 1000 trials, comparing the actual rank to the expected rank, a perfect fitness signal will have a St.Dev. of zero; 1,2,3,4,5 ...

Ranking of random fitness has a St.Dev of about 410 from the expected rank. Using (1 + k / 1000)÷Rnd the St.Dev between the expected rank and the actual rank is about 320-350. A very weak signal.

This is sorted by Working Fitness. What was ranked 863 is now first on the list.

Phylo      Working       Phylo
Fitness     Fitness       Rank
1.863   15,077.64      863
1.526     5,482.23      526
1.759       591.73       759
1.298       369.09       298
1.413       274.25       413
1.986       207.86       986
1.781       194.58       781
...

Of course, there are many other sources of noise that dilute the fitness signal, but this is the most egregious departure from model realism.


-
A different run than the previous. They're both typical and should be easy to verify.
Posted by: dvunkannon on June 25 2009,10:07

Quote (Zachriel @ June 25 2009,09:41)
The calculation of "working fitness" is seemingly broke. From Mendel's Accountant:

      do i=1,total_offspring
          work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
       end do


We can test this by taking a series of fitnesses k from 1.001 to 2,

 For k = 1 To 1000
      Cells(k, "a") = 1+ k / 1000
      Cells(k, "b") = Cells(k, "a") / Rnd
  Next k


This is a typical result:

9 Average
31 St.Dev.
362% Relative St.Dev.
1.04 Min
533 Max

The original distribution of k has a Relative St.Dev. of 19%. It's worse for fitnesses distributed between 0.5 and 1.5 or 0.5 and 1. (Just like the phylogenetic fitness, the calculation is not normalized. And why ÷Rnd^1? Why not ÷Rnd^½ or ÷Rnd^¾?) This single operation eliminates the vast majority of the signal from genetic or phylogenetic fitness.

A more reasonable calculation is < Roulette Wheel selection >.

I have a working version of Gregor's Bookkeeper. I'll post on that in the next few rotations.


-
Fixed a problem.
---------------------QUOTE-------------------


I assume you mean the scaling of fitness by total population fitness used in routlette selection, correct?

I'm not following the dissection of MA, is work_fitness being used to drive a selection algorithm? If so, the division by rnd() is equivalent to assuming that all selection takes place after a night of drinking heavily.

Roulette selection assumes that details don't matter - of several equally snappily dressed men at the bar, the one with the clean fingernails will not be selected much more frequently than the rest with dirty fingernails. Is this "realistic"? What does MA assume about sexual selection?

A lot of GAs use tournament selection to maintain a more constant selection pressure. I think you could argue that tournament selection models some part of the sexual selection process.

ETA - or give users a choice of selection algorithm.
Posted by: Zachriel on June 25 2009,11:00

Quote (dvunkannon @ June 25 2009,10:07)
I assume you mean the scaling of fitness by total population fitness used in routlette selection, correct?
---------------------QUOTE-------------------


Yes, the scaling determines probability of successful reproduction.  

Quote (dvunkannon @ June 25 2009,10:07)
I'm not following the dissection of MA, is work_fitness being used to drive a selection algorithm? If so, the division by rnd() is equivalent to assuming that all selection takes place after a night of drinking heavily.
---------------------QUOTE-------------------


Close, but not quite. Actual reproductive success involves more than mating selection. There is still a strong component of phylogenetic health involved. Reproductively healthier individuals will tend to mate more often and produce more offspring, even when they mate randomly. (Perhaps you should spend more time in bars—for observational purposes, of course.)

Quote (dvunkannon @ June 25 2009,10:07)
Roulette selection assumes that details don't matter - of several equally snappily dressed men at the bar, the one with the clean fingernails will not be selected much more frequently than the rest with dirty fingernails. Is this "realistic"?
---------------------QUOTE-------------------


Yes. Minor differences tend to have minor effects. There is no amplification for discernment. Like the bar at 10 PM rather than 1 AM (i.e. after only moderate alcohol consumption).

Quote (dvunkannon @ June 25 2009,10:07)
What does MA assume about sexual selection?
---------------------QUOTE-------------------


It doesn't (as far as I know).
Posted by: Zachriel on June 25 2009,20:47

Gregor's Bookkeeper is now workable enough to report some results. But first the implementation.

Gregor's Bookkeeper doesn't keep track of all mutations, just their cumulative effect. The only concern at this point is average fitness and genomic decay. But it does allow us to inspect the population to see what is happening. Bells and whistles don't matter much in the face of the claim that Mendel's Accountant has overthrown 150 years of biological science. We eagerly anticipate replicating this exciting discovery.

Each member of the population is comprised of a collection of genes. Mutations can occur randomly in any member and any gene. Most mutations are very nearly neutral.  



Beneficial mutations are often dominant because they can cover for the activity of a weaker partner. But sometimes alleles can combine effects, or deleterious mutations can sometimes even be dominant. This aspect of the model is still unsatisfying, so we're leaving Dominant = 50%, Recessive = 50%.



Phylogenetic Fitness is normalized. That means Environmental Noise scales properly with fitness.  



PhyloIndex holds the sort of the Phylogenetic Fitness. We take those with the highest Phylogenetic Fitness to enter the mating round (after eliminating those with very low fitness).



Those with higher Phylogenetic Fitness are more likely to mate and produce a large number of offspring. The first step is to create an accumulation of normalized fitnesses. Then we use this table to determine winners and losers. Finally, we select a random allele from each Parent.



Other aspects include allowing normal variations in population, average offspring and mutation rates over time.
Posted by: midwifetoad on June 25 2009,21:28

I think you left out the smoke and mirror generators and the obfuscationizer.
Posted by: Zachriel on June 26 2009,07:31

Quote (midwifetoad @ June 25 2009,21:28)
I think you left out the smoke and mirror generators and the obfuscationizer.
---------------------QUOTE-------------------


Mirrors are easy, but modeling smoke. That's hard.
Posted by: Henry J on June 29 2009,13:34

Quote (Zachriel @ June 26 2009,06:31)
Mirrors are easy, but modeling smoke. That's hard.
---------------------QUOTE-------------------


And hazardous to the health.
Posted by: Zachriel on July 02 2009,09:25

Quote (Zachriel @ June 25 2009,11:00)
 
Quote (dvunkannon @ June 25 2009,10:07)
{snipped} ... equivalent to assuming that all selection takes place after a night of drinking heavily.
---------------------QUOTE-------------------


(Perhaps you should spend more time in bars—for observational purposes, of course.)
---------------------QUOTE-------------------


I trust, dvunkannon, that your field work is progressing well.

Quote (dvunkannon @ June 25 2009,10:07)
A lot of GAs use tournament selection to maintain a more constant selection pressure. I think you could argue that tournament selection models some part of the sexual selection process.
---------------------QUOTE-------------------


I've been thinking about your analogy and suggestions. Roulette seems to emulate relative fecundity. Those with higher fecundity will tend to mate with those of higher fecundity simply because they mate and produce offspring more often. (Wimps pass out in the corner, rarely mating.) But Tournament does tend to pair those with like-fecundity, so I suppose it does seem like sexual selection. I may try that next. I thought about some more explicit method, but that might be beyond what we are trying to accomplish with this model.
Posted by: Zachriel on July 02 2009,09:25

There was some question about the mutation algorithm. It's based on a mirror of this RndGamma distribution. {The lip at 1.000000 is due to cutting off the maximum effect at an arbitrary level.}



The probability distribution of mutational effect tends to be very small, with larger effects increasingly rare.



The routine usually returns a multiplier of just a little more than one (favorable), or a multiplier of just a little less than one (deleterious).

Deleterious and favorable mutational effects have an identical distribution, except that favorable mutations are generally much rarer. I'm using the Excel Application GammaDist function. It's not a perfect distribution, but should be more than sufficient for our purposes.
Posted by: Zachriel on July 02 2009,09:27

The qualitative results of Gregor's Bookkeeper are fairly straightforward. If the rate of reproduction is low compared to the mutation rate, that is, if many offspring are mutants rather than clones, then deleterious mutations will tend to accumulate in genomes. (They will actually eventually accumulate with any finite population.)

But every once in a while, a significant and favorable mutation will sweep through the population. For a given ratio of favorable to beneficial mutations, the larger the population, the less the fitness will drop before a beneficial mutation has a chance to sweep through the population. So for a given setting, just dial-up the population (or reproductive rate) to avoid genomic meltdown.

Now consider a large population that has been divided into small isolated groups. Many will meltdown. But some, by chance, will experience favorable mutations that will sweep the population. Then if the population is allowed to grow, this subpopulation can avoid genomic meltdown and experience further gains in fitness. Lots of branches, most of which fail, but a few that then prosper. Adaptive radiation.

And we have yet to account for sexual selection, or hybridization.
Posted by: midwifetoad on July 02 2009,10:11

Forgive me for having trouble with the meltdown concept. Is there an observed instance in the real world of a reproducing population going extinct due to the accumulation of deleterious mutations?

What would that look like?
Posted by: oldmanintheskydidntdoit on July 02 2009,10:19

Quote (midwifetoad @ July 02 2009,10:11)
Forgive me for having trouble with the meltdown concept. Is there an observed instance in the real world of a reproducing population going extinct due to the accumulation of deleterious mutations?

What would that look like?
---------------------QUOTE-------------------


< http://www.uncommondescent.com/ >

ba-dum-tish!
Posted by: Zachriel on July 02 2009,10:46

Quote (midwifetoad @ July 02 2009,10:11)
Forgive me for having trouble with the meltdown concept. Is there an observed instance in the real world of a reproducing population going extinct due to the accumulation of deleterious mutations?

What would that look like?
---------------------QUOTE-------------------


In asexual populations, it's rather easy to understand. If most offspring are mutants, they are nearly always of lower fitness. Organisms such as bacteria have low individual mutation rates, and most offspring are exact clones. Even then, they often recombine genetic material.

It's a bit different in sexually reproducing species such as vertebrates. Bottleneck populations, species that have had their habitat severely reduced, or varieties near range edges, are believed to suffer mutational meltdown. Reproductive rates and numbers of viable offspring tend to decline due to inbreeding, and that can make the species vulnerable to extinction. It's a problem with many species being pushed to the brink by humans.

Hybridization can sometimes reinvigorate a species.
Posted by: midwifetoad on July 02 2009,11:22

I'm still a bit confused. It is commonly argued, and has been for many decades, that low population numbers lead to extinction, even if the remaining members are protected.

But I fail to get the relevance for TOE. Extinction seems to be a rather common event at geological time scales, so what's the problem for the theory?

I'm thinking the definition of deleterious is rather arbitrary unless it is exposed to selection. What prevents us from defining all niche specializations as deleterious at geological time scales, since specialization puts a population at risk for extinction.
Posted by: Zachriel on July 02 2009,11:52

Quote (midwifetoad @ July 02 2009,11:22)
I'm still a bit confused. It is commonly argued, and has been for many decades, that low population numbers lead to extinction, even if the remaining members are protected.

But I fail to get the relevance for TOE. Extinction seems to be a rather common event at geological time scales, so what's the problem for the theory?
---------------------QUOTE-------------------


The claim is that genetic meltdown affects even normal-size populations. Of course, Mendel's Accountant ignores many factors that mitigate against the claim.

 
Quote (midwifetoad @ July 02 2009,11:22)
I'm thinking the definition of deleterious is rather arbitrary unless it is exposed to selection. What prevents us from defining all niche specializations as deleterious at geological time scales, since specialization puts a population at risk for extinction.
---------------------QUOTE-------------------


You're right in general.

Presumably, Mendel's Accountant is considering absolute fitness, meaning populations can precipitously degrade if the fecundity drops below a certain level. If they were serious, they would investigate under which circumstances they would not see meltdown, and explore this boundary in more detail. But their purpose is apologetics dressed up as science. I certainly wouldn't consider any such simulation to be anything more than qualitatative unless carefully matched to a biological situation. It doesn't even account for sexual selection; and without empirical verification, it is subject to simple mistakes such as the calculation of "working fitness" noted above. They see what they want to see.
Posted by: midwifetoad on July 02 2009,12:29

I'd think if you were trying to model a natural phenomenon, you would have an example in mind. Something from the real world to model. But maybe that's just me.

I haven't heard of any decline in the fecundity of asexual organisms. All the species I know of that are listed as in danger of extinction seem to be in danger due to reduction of or rapid change in habitat,  or due to introduction of diseases or predators. Neither of these causes involves genetic entropy.

I just wonder what observation led to the genetic entropy hypothesis.
Posted by: Bob O'H on July 02 2009,13:52

Quote (midwifetoad @ July 02 2009,12:29)
I'd think if you were trying to model a natural phenomenon, you would have an example in mind. Something from the real world to model. But maybe that's just me.
---------------------QUOTE-------------------


Mendel's Accountant is meant to be used to make a general point about evolution, so they don't want  to burden the model with the specifics of one system.



---------------------QUOTE-------------------
I haven't heard of any decline in the fecundity of asexual organisms. All the species I know of that are listed as in danger of extinction seem to be in danger due to reduction of or rapid change in habitat,  or due to introduction of diseases or predators. Neither of these causes involves genetic entropy.
---------------------QUOTE-------------------


Well, yes.  I'm a bit cynical about a lot of conservation genetics: far too many people doing it use species of great conservation concern, like fruit flies.

In think there has been some work showing it happens in nature, but there are so many other threats as well.



---------------------QUOTE-------------------
I just wonder what observation led to the genetic entropy hypothesis.
---------------------QUOTE-------------------


The Fall.

Sanford is a YEC, so his argument is that things have been getting worse since The Fall, and all we can do is pray.
Posted by: Henry J on July 02 2009,14:12

Quote (midwifetoad @ July 02 2009,11:29)
I just wonder what observation led to the genetic entropy hypothesis.
---------------------QUOTE-------------------


Observation? Observation? We don't need no steeenkin pathetic-level-of-detail observation!!111!!eleven1!!
Posted by: dvunkannon on July 02 2009,23:51

Quote (Zachriel @ July 02 2009,10:25)
Quote (Zachriel @ June 25 2009,11:00)
   
Quote (dvunkannon @ June 25 2009,10:07)
{snipped} ... equivalent to assuming that all selection takes place after a night of drinking heavily.
---------------------QUOTE-------------------


(Perhaps you should spend more time in bars—for observational purposes, of course.)
---------------------QUOTE-------------------


I trust, dvunkannon, that your field work is progressing well.

Quote (dvunkannon @ June 25 2009,10:07)
A lot of GAs use tournament selection to maintain a more constant selection pressure. I think you could argue that tournament selection models some part of the sexual selection process.
---------------------QUOTE-------------------


I've been thinking about your analogy and suggestions. Roulette seems to emulate relative fecundity. Those with higher fecundity will tend to mate with those of higher fecundity simply because they mate and produce offspring more often. (Wimps pass out in the corner, rarely mating.) But Tournament does tend to pair those with like-fecundity, so I suppose it does seem like sexual selection. I may try that next. I thought about some more explicit method, but that might be beyond what we are trying to accomplish with this model.
---------------------QUOTE-------------------


I'm glad you found the analogy helpful!

in re: field work and observation - piss poor at the moment. The result of living in New Jersey while dating a Czech supermodel that lives in Prague.
Posted by: Zachriel on July 03 2009,17:01

Quote (slpage @ June 23 2009,19:10)
Constant population size is one of my pet peeves with Haldane's model as well.
---------------------QUOTE-------------------


Experimenting with Gregor's Bookkeeper.

VARIABLE POPULATION and FECUNDITY: Setting the number of Parents and the number of Children such that they vary (e.g. 20% relative standard deviation), they tend to achieve a higher fitness. This is apparently because when the population bottlenecks, it weeds out the weaklings. This is somewhat analogous to variations in climate, such as plenitude followed by drought.

TIME: The typical pattern is to watch the fitness slowly ebb away, but then suddenly spring back. If you quit too soon, you would never see this. As long as the population is large enough to be reasonably stable to avoid extinction for long enough, you will see a sawtooth pattern; a slow slide down in fitness, then a sudden increase as a significant favorable mutation sweeps through the population.

DOMINANCE: Still not happy with this feature. As might be expected, setting the fitter allele to be more dominant leads to greater fitness. This might be considered a cheat though. Setting an arbitrary allele to be dominant, it still often leads to greater fitness. An interesting test was to set dominance on a sliding scale, 1/G for G = 1 to numG (G for gene). This means that for some genes, the dominant gene is deleterious, for others favorable. Interestingly, this also leads to greater fitness.

SHAKING the BOX: It almost seems that anything that adds a bit of complex motion allows those with the highest fitness to rise to the top. Need more data.



DIFFERENCES between Gregor's Bookkeeper and Mendel's Accountant:

  * Multiplicative fitness.
  * Roulette Wheel mating, rather than the odd "divide by random" method.
  * Can handle very large populations and generations—if you're willing to wait.
  * Raised limit on the effect of favorable mutations. Adjusted some other settings.
Posted by: dvunkannon on July 06 2009,12:19

Thinking about the problem of evolving a code, I just wanted to bounce some ideas off of anyone that wants to answer...

Here is a genome design:
6*26 bits of data to build a phenotype, followed by
64*8 bits of coding table

The coding table works like this. Each entry contains an 8 bit string. Each 8 bit string mapt to a list of affinities to a subset of 32 characters:

127 -> 20% A, 40% F, 40% J

This table of affinities would be salted with entries that guarantee A-Z have some entry with high affinity.

The GA works like any binary coded GA. To create a phenotype, run the data section of the genome through the code table 6 bits at a time. The 6 bits are like the three codons in DNA, mRNA, and tRNA. Taken as an index, they give an 8 bit value. The 8 bit value gives a set of affinities, and you spin the roulette wheel to see which letter you get. That letter is the phenotypic expression of the 6 bits you started with. Continue to loop until you've finished the data section of the genome.

To score a phenotype, compare with the string A .. Z and take the sum of the squared error at each position (ex target D, actual A, error is 3).

I think that if it works, you'll eventually get a population with the code table filled with at least one copy of each high affinity 8 bit string.

Comments? Is choosing 32 useful entries out of 256 too easy or too hard?
Posted by: midwifetoad on July 06 2009,13:00

My little program seeks to build words through cumulative selection. The scoring algorithm is based on the frequency of letter pairs and triplets occurring in actual words. I've built frequency tables for a number of languages.

Selecting just on the relative frequency of pairs and triplets, it builds 7, 8, 9 and ten letter words in very few generations, involving just a few thousands mutations.

More interesting to me is the fact that it builds long word-like strings that look like words and are perfectly pronounceable, but aren't in the dictionary. Also interesting (to me) is the fact that it often ignores dictionary words in selecting the most fit. But despite being word blind, it builds words.

I don't want to read too much biology into this, but I think it blows away Behe's claim that long strings can't be the result of a selection algorithm that is unaware of a goal or target.

My program doesn't have a target or halting condition and continues to produce unique strings for hundreds of generations. I prevent getting stuck on a high scoring mother by periodically killing off the most fit child. Literally, every fourth generation, the most fit bites it.

I don't know if this simulates anything in nature, but to me it indicates that a rather mindless algorithm can do things beyond the ken of rocket scientists.
Posted by: midwifetoad on July 06 2009,13:07



---------------------QUOTE-------------------
This table of affinities would be salted with entries that guarantee A-Z have some entry with high affinity.
---------------------QUOTE-------------------



I'd be curious to know if your concept of affinities has any similarity to my use of letter pairs.

In my limited experience, the output of a GA is limited only by the fitness scoring algorithm. I tried to build the simplest and stupidest scoring algorithm that could still produce interesting results.

My highest priority was to produce a fitness scorer that couldn't be construed as a target.
Posted by: dvunkannon on July 06 2009,13:35

Quote (midwifetoad @ July 06 2009,14:07)


---------------------QUOTE-------------------
This table of affinities would be salted with entries that guarantee A-Z have some entry with high affinity.
---------------------QUOTE-------------------



I'd be curious to know if your concept of affinities has any similarity to my use of letter pairs.

In my limited experience, the output of a GA is limited only by the fitness scoring algorithm. I tried to build the simplest and stupidest scoring algorithm that could still produce interesting results.

My highest priority was to produce a fitness scorer that couldn't be construed as a target.
---------------------QUOTE-------------------


The affinities are my way to model the situation that tRNA molecules could evolve from molecules that accepted any of several amino acids before getting more specific. Right now there is still one tRNA that will accept two different AAs, but instead of evolving a more specific acceptor, the system fixes up errors after they occur.

We're aiming to do different things, so I don't see a conceptual overlap between your bigram and trigram tables and what I was conjecturing. I think what you are doing is using a Hidden Markov Model to evolve phoneme level utterances. I think Chomsky showed that human language is more than an HMM, but as you said, Behe doesn't think evolution can even do that much!

ETA - I take your point on the scoring. I thought of taking the data portion out of the genome, and just testing it against various strings, but I think going against a fixed target might be a simple first test of the idea.
Posted by: midwifetoad on July 06 2009,14:21



---------------------QUOTE-------------------
I think Chomsky showed that human language is more than an HMM, but as you said, Behe doesn't think evolution can even do that much!

---------------------QUOTE-------------------



I certainly am not ambitious enough to attempt evolving language. My goal was simply to test Behe's edge of evolution in a way that is a notch less deterministic than Dawkin's original Weasel.

I know, for example, that selecting for phenotypes is theoretically different from selecting genotypes, but I don't see how this matters much in a model. If the phenotype doesn't reflect the genotype in a way that is visible to the selecting agent, it doesn't really matter.

As for evolving phoneme level utterances, I think that's quite appropriate for a demo program. For one thing, I can make trade names. Bactine, for example. The distinguishing features of a trade name are pronounceability and novelty.

I don't know much about Chomsky except that he said a lot of things that sound like ID. I resist his pronouncements for that reason alone.
Posted by: Henry J on July 06 2009,15:25

At a guess, computing a phenotype for each genome might make it easier to allow for things like neutral drift, or making some traits more critical than others without necessarily attaching those traits to certain genes.

Henry
Posted by: midwifetoad on July 06 2009,16:05

Quote (Henry J @ July 06 2009,15:25)
At a guess, computing a phenotype for each genome might make it easier to allow for things like neutral drift, or making some traits more critical than others without necessarily attaching those traits to certain genes.

Henry
---------------------QUOTE-------------------


That sounds like saying genes modify the effects of other genes.

But unless you have some way of modeling biological development, you are not being more "realistic" than my database of letter combinations known to have fitness.

I can quickly assign a fitness score to any arbitrary string of characters, comparing say EUPOUACCT to EUPOUAGCT, and choose one to be the parent of the next generation.

Now if you were doing a straight Weasel program you could just count the number of letters that match your target. But I am not searching for a target. I am shaping a population to look and sound like words from a specific language.

I'm not claiming to model biology, and I have doubts that we have the ability at this time to model biology. What we can do is model specific claims and specific assumptions.

All I set out to do was respond to criticisms of Dawkins asserting that his program did nothing but seek a fixed target. The demonstrable fact is that a clever selector can build unanticipated structures -- functional strings much longer than those known to the selector.
Posted by: dvunkannon on July 06 2009,16:27

Quote (midwifetoad @ July 06 2009,17:05)
Now if you were doing a straight Weasel program you could just count the number of letters that match your target. But I am not searching for a target. I am shaping a population to look and sound like words from a specific language.
---------------------QUOTE-------------------


Yes, your system would stump the level 0 critic because you don't have a target hard coded that looks exactly like a population member.

The level 1 critic would say that the bigram and trigram table is hard coded, and that your system is rewarding population members that have the same frequency distributions as the table. That is your target.
Posted by: midwifetoad on July 06 2009,16:44

Just as soon as cdesign proponentsists come up with a level one critic, I'll worry.

But the general argument is that selection doesn't work. Or that it can't make anything new.

The demonstrable fact is that a selector that works at the organism level allows novelty to arise. If you toss in an occasional asteroid, it continues to arise over many generations . That takes care of latching and virtual latching.

Regardless of how you parse the argument, my dumbass selector makes words that could be in the dictionary, but aren't. It also makes things like jargon and acronyms that are in common use but aren't allowed in my dictionary (developed for Scrabble).
Posted by: midwifetoad on July 06 2009,16:52

Selection has to be, at some level, hard coded.

At the biochemical level, most combinations are non-viable. (More ways to be dead than alive.)

At the population level, the selecting environment is pretty discriminating. diseases and predators are a lot smarter and more versatile than my probability tables.

I use tables for two reasons. One is that they are easy, fast and practical. The second is that they reflect a history of language evolution. Unless you don't believe language evolves.

They are a shorthand for all the historical events that made some strings of letters words and some not.

I am certain that some of the children produced by my demo that are not currently in use will be. One of the cute words that popped up was "mindfly." Not surprisingly, there is a mindfly.com.

So I think my selector does, at some elementary level, model one of the ways language evolves.
Posted by: Henry J on July 06 2009,21:38



---------------------QUOTE-------------------
Selection has to be, at some level, hard coded.
---------------------QUOTE-------------------


Yeah, it seems likely (to me at least) that some aspects would have to be directly programmed.

One way to get at least partly around that (at the cost of a bunch more work) is multiple species interacting with each other - in that case, the "target" is a balance between tasting bad to the other guys, and being able to catch (and digest) enough of the other guys to get by. That might be one way to keep it from converging on any one final answer.

Henry
Posted by: midwifetoad on July 06 2009,22:40

The aspect of my simple weasel program that I find most interesting is the period after selection creates a population having high fitness. It then dithers around -- sometimes for fifty or a hundred generations -- without increasing fitness. Then, suddenly a long word will pop out, as if the program were waiting patiently for some hopeful monster.

My reading is that the plateau of high fitness is where any living population resides, and that so called neutral mutations are not really neutral, except that they have a fitness level as high as the existing average. Or perhaps they balance -- one up, one down.

It isn't really necessary to invoke any miracles of improbability. The dithering at high levels of fitness allows all kinds of things to emerge which could not emerge in one step from a low level of fitness.

I record the fitness scores of the mothers in my demo. When I see a 10 letter word emerge out of nowhere, I can trace the mothers and their fitness scores. Doing this, I see that for the most part, the history exhibits the same kind stepwise shift toward the 10 letter word that Dawkins' Weasel shows, as if that particular word were the goal.

Because I kill off the fittest every now and then, the scores sometimes go backwards. Sometimes the total count of "correct" letters will go backwards. All the more astonishing in retrospect. Dawkins' program can't do that.

But of course I wrote the program and I know the stepwise movement toward a target is an illusion. The program merely insures that given an adequate level of fecundity, fitness will always be high. After the first twenty or thirty generations, the population is never more than a few steps away from a "breakthrough."

I think what the cdesign proponentsists are missing is the fact that stasis in level of fitness does not imply genetic stasis. There are many variations that are equivalent in fitness. The path to a breakthrough structure doesn't have to  involve a continuous increase in overall fitness.

As an author, I'm biased, but I think my little program allows a person to see genetic drift. You see every mutation and every child. Of course I could be completely wacked. :p
Posted by: dvunkannon on July 07 2009,12:16



---------------------QUOTE-------------------
Sometimes the total count of "correct" letters will go backwards. All the more astonishing in retrospect. Dawkins' program can't do that.

---------------------QUOTE-------------------



Errr, no. That was what the whole latching kerfuffle was about!
Posted by: Henry J on July 07 2009,13:49

Yeah, they couldn't seem to get that offspring with a backwards mutation would be less fit, and so wouldn't be selected, and so wouldn't show up in the next generation. Direct logical consequence of the algorithm, no special coding needed.

Henry
Posted by: midwifetoad on July 07 2009,13:57

Quote (dvunkannon @ July 07 2009,12:16)


---------------------QUOTE-------------------
Sometimes the total count of "correct" letters will go backwards. All the more astonishing in retrospect. Dawkins' program can't do that.

---------------------QUOTE-------------------



Errr, no. That was what the whole latching kerfuffle was about!
---------------------QUOTE-------------------


The latching crowd asserted that once a letter was correct it was protected from mutating. That's a separate issue from whether the total number of correct letters can decline.

At reasonable mutation rates, there will "always" be at least one perfect, unmutated child in the Dawkins program, and the total fitness will never decline. It is theoretically possible for two good mutations to offset one bad one, so it is possible for a good letter to revert.

My program pretty much guarantees that every fourth generation will decline in total fitness.

Since my fitness definition is much broader than a single target, I stir the pot to avoid getting stuck on a single individual having a high fitness score.

I don't know if I am modelling anything real. My only concern is to demonstrate that selection can do interesting things without a specific target.

I would argue, however, that my fitness database is at least partially equivalent to a real selection history. Which is to say, the distribution of letter pairs or phonemes in a real language is the result of a real selection history, and the database embodies that history.
Posted by: midwifetoad on July 07 2009,14:18

Here's what I'm thinking, right or wrong:

The Behe challenge is to evolve a flagellum from an organism that may have some parts of the structure, but no motor.

Behe and Dembski are thinking that each step must be selected, that is each step must improve fitness; otherwise the probability of building a novel, complex structure is nil.

My argument, based on my own observations, is that when a population is at a high level of fitness -- and any living populations is fit by definition -- there are many variations that neither improve nor degrade fitness. (obviously this isn't an original thought).

This means there is little cost to variation. You may or may not hit upon some new invention, but the cost of "exploring the search space" is nil.

Now I could be just another crank, but I think my program shows this happening in a reasonable time frame. It's a toy illustration, but I think it's worth looking at as a possible teaching device.
Posted by: Henry J on July 07 2009,21:58

And of course the challenge back to Behe is to explain how his ideas explain a single nested hierarchy being followed by the vast majority of major traits and DNA sequences. (If that isn't being picky.)

Henry
Posted by: midwifetoad on July 08 2009,00:28

Quote (Henry J @ July 07 2009,21:58)
And of course the challenge back to Behe is to explain how his ideas explain a single nested hierarchy being followed by the vast majority of major traits and DNA sequences. (If that isn't being picky.)

Henry
---------------------QUOTE-------------------


I haven't read Edge of Evolution, but I understand he lists many evidences for evolution without mentioning ERVs.

Can't imagine why.
Posted by: Zachriel on July 08 2009,10:50

Quote (Bob O'H @ July 02 2009,13:52)
I'm a bit cynical about a lot of conservation genetics: far too many people doing it use species of great conservation concern, like fruit flies.
---------------------QUOTE-------------------


Heh. Then I suppose you are equally cavalier about the fate of isogenic < yeast >.

Quote (Bob O'H @ July 02 2009,13:52)
In think there has been some work showing it happens in nature, but there are so many other threats as well.
---------------------QUOTE-------------------


When populations fall to very low levels, then minor catastrophes can result in extinction. With Gregor's Bookkeeper, if we set the relative variance to a high level (to represent contingent variations), then it can easily result in extinction. A low reproductive rate also makes a species less robust when populations are at a low level or subject to broad environmental fluctuations. Also important in very small populations is the role of recessive genes due to inbreeding. But in nature, we often expect some hybridization with closely related populations.

Some results of Gregor's Bookkeeper. Here are the settings for this trial.



numP is number of Parents.
numT is the number of Turns (but was stopped).
numO is the average number of Offspring.
numG is the number of Genes.
Dominant of 50% has no effect.


(The limiting factor for the software seems to be the Roulette Selection algorithm. The above took a few hours. I may try to optimize it later or convert it to a C++ module—but I doubt it is worth the trouble.)

Even with the very low favorable-to-deleterious ratio of 0.00001, the result was an increase in fitness as seen here.



You can't see it well, but the fitness drops rapidly at the beginning. It's easier to see in log10.



This is because the original population has a uniform (isogenic) fitness of 1, and it takes some time for it to stabilize as a population of nearly neutral mutations.

With a population of 100, it increases in fitness with a favorable-to-deleterious ratio of 0.001 0.002. With a population of 1000, it increases in fitness with a favorable-to-deleterious ratio of 0.0001 0.0002. In nature, a species of only a few thousand individuals is usually considered small.

One of the problems with the model is that {in nature} the ratio of favorable-to-deleterious tends to increase as fitness declines. But leaving that aside ... We see that with larger populations, and within the assumptions of the model, genetic meltdown is not an issue.

The "divide by random" selection algorithm in Mendel's Accountant is flawed. If we add a slight dominance to beneficial alleles or introduce sexual selection, the results would be even more lopsided. The conclusion that genetic meltdown is a problem for reasonably large populations, and that this sort of model indicates a problem for evolutionary theory, is not supported.


-
More accurate value reflects additional data.
Posted by: Zachriel on July 09 2009,21:10

This gives an idea of the distribution of fitnesses within a population. (The red horizontal line is the mean.) It's fairly uniform except at the margins, but a lot happens at the upper margin. Notice how bottlenecks, though risking extinction, increase overall fitness by weeding out the weakest.



100 numP, Beginning population (of Parents)
100 topP, Population capacity (of Parents)

6 numO, Average Offspring per Parent
20 numG, Genes
0.002 rateFav, Rate of Favorable mutations


20% varChildren, Drift in average fecundity (relative standard deviation)
20% varParents, Drift in average population (relative standard deviation)
5% varMuts, Drift in average mutation rate (relative standard deviation)


Mutational meltdown is only a plausible problem if a population is kept few in number and genetically isolated for a long period of time. Otherwise, bottlenecks can be evolutionary opportunities. The claim that even large populations suffer genetic meltdown, and therefore the world can only be a few thousand years old, is not supported by this sort of modeling.

-
Edit: Fixed graphic so it doesn't show the truncation after previously having calculated the average.
Posted by: Zachriel on July 09 2009,21:47

This has the same settings, but a population of 1000 Parents. It exhibits the same initial decline, but then quickly runs off the chart.


Posted by: Zachriel on July 09 2009,21:54

This one shows growth from an intial population of 1, with topP being 1000. A "founder" population.



Notice the first few turns, as the deleterious mutations shower precipitate down, and the population diversifies. Then we see fitness stabilize. Later, this population will also increase in fitness.

-
Like that word better. Quidquid latine dictum sit, altum viditur.
Posted by: Steve Schaffner on July 09 2009,22:15

I'm afraid I haven't been following your program's progress in detail. What distribution of fitness effects do your mutations have?
Posted by: midwifetoad on July 09 2009,22:39

I'm certainly glad to hear that our precious ameboids (not to mention our lovely E.coli) are not endangered.

I was worried there for a while.
Posted by: dvunkannon on July 10 2009,07:02

Zachriel, can you post your code somewhere? It would help me understand it better to walk though it. Thanks!
Posted by: Zachriel on July 10 2009,07:10

Quote (Steve Schaffner @ July 09 2009,22:15)
I'm afraid I haven't been following your program's progress in detail. What distribution of fitness effects do your mutations have?
---------------------QUOTE-------------------


We're using the VBA GammaDist. It's not perfect, but seems more than sufficient for the purpose. We're limiting the extremes in any case. Click for a < graph >.

RndGamma = Application.GammaDist(Rnd + inf, gamMuts, 1, False)
If RndGamma > topMuts Then RndGamma = topMuts


There are other posts on the same page showing snippets of code from Gregor's Bookkeeper, differences with Mendel's, and an analysis indicating an apparent problem with their calculation of "working fitness".

Favorable and deleterious mutations are mirror image distributions—other than the RateFav multiplier such that favorable mutations are much rarer. There's a maximum effect, topMuts, usually set at 1.

inf is infinitesimal
gamMuts can vary, but the distribution shown at the link is 0.01.
RateFav usually varies from 0.01 to 0.00001, and is 0.002 in the charts shown just above.

Posted by: Zachriel on July 10 2009,07:45

Quote (dvunkannon @ July 10 2009,07:02)
Zachriel, can you post your code somewhere? It would help me understand it better to walk though it. Thanks!
---------------------QUOTE-------------------


Sure. It's not set up for the general user, but it's not all that difficult. The first lines of code include the array dimensions. You may have to set those to suit. And there's not a lot of error detection. It's implemented in Excel 2007, but seems to work in somewhat older versions. You have to enable macros, of course.

Gregor's Bookkeeper
< http://www.zachriel.com/mendel/gregorsbookkeeper.xlsm >

Hmm. The file won't download in Windows Vista 64. It keeps saying it's a zip-file. It's not. And the file isn't in the zip-folder it creates. The download works in other operating systems. I think Microsoft is trying to "help". If that doesn't work, try this.

Gregor's Bookkeeper
< http://www.zachriel.com/mendel/gregorsbookkeeper.zip >
Posted by: Zachriel on July 10 2009,21:23

Some updates to Gregor's Bookkeeper. More readable subroutine structure. Also, fixed a minor bug for numO (average number of Offspring), which wasn't accepting fractional values.


Posted by: Zachriel on July 11 2009,08:44

Added a Genetic Fitness chart, similar to those above. It takes a snapshot of each turn, stored as a gif. It defaults FALSE, so you have to turn it on in the "Vars" worksheet. (Due to limitations in Excel, it can't plot more than 32000 data-points, so that limits the number of Children it can display.)
Posted by: dvunkannon on July 11 2009,09:30

Quote (Zachriel @ July 10 2009,22:23)
Some updates to Gregor's Bookkeeper. More readable subroutine structure. Also, fixed a minor bug for numO (average number of Offspring), which wasn't accepting fractional values.


---------------------QUOTE-------------------


You could move the Extinction test up to right after generating stats, right? After Phenotypic Fitness, you know there are no more population members.

It's trivial, and this does look very clean!
Posted by: Zachriel on July 11 2009,10:12

Quote (dvunkannon @ July 11 2009,09:30)
   
Quote (Zachriel @ July 10 2009,22:23)
Some updates to Gregor's Bookkeeper. More readable subroutine structure. Also, fixed a minor bug for numO (average number of Offspring), which wasn't accepting fractional values.


---------------------QUOTE-------------------


You could move the Extinction test up to right after generating stats, right? After Phenotypic Fitness, you know there are no more population members.

It's trivial, and this does look very clean!
---------------------QUOTE-------------------


Extinction can occur in Selection if no one has positive phylogenetic fitness. If the number of Parents (numP) does drop to 0, then it will fall through the next routines safely. We want Stats, at least. Or extinction can occur in Roulette if there are no children (numC). We need to catch it there, or it errors on the next loop in PhyloFitness when it sorts. I admit I'm not quite happy with just letting it drop through from Selection as that relies on what happens with Parents (numP) in the following procedures—which could change. It probably should be checked any place extinction could occur. (A popstack command would be handy, but I don't see one.)
Posted by: Zachriel on July 11 2009,10:17

Quote (Zachriel @ July 11 2009,08:44)
Added a Genetic Fitness chart, similar to those above. It takes a snapshot of each turn, stored as a gif. It defaults FALSE, so you have to turn it on in the "Vars" worksheet. (Due to limitations in Excel, it can't plot more than 32000 data-points, so that limits the number of Children it can display.)
---------------------QUOTE-------------------


There's no way to reliably predict what the range of fitnesses will be or even the top population of children (numC). Consequently, Charts are set automatically with every turn. You can make them static (as they are in the gif animations above) by setting them in the makeChart procedure. The commands are there, just remarked-out.
Posted by: Zachriel on July 11 2009,12:31

The power of recombination.

Usual settings, with numP = 100, but each gene begins with alleles of 1.0 and 0.1 while rateMuts = 0 and no dominance effect. With 20 genes, the starting fitness is 0.0000064. By Turn 11, all deleterious alleles have been purged, i.e. minFitness = 1.  



Interestingly, it takes about the same amount of time for numP = 1000. If we set the deleterious allele to 0.9999 it still only takes about 20 turns or so to weed out the last of it. Keep in mind too that 50% of phylogenetic fitness is random noise.

-
Added more nines.
Posted by: Zachriel on July 12 2009,16:31

Starting with an isogenic population distorts the initial behavior of the system. If you don't wait long enough, you will reach erroneous conclusions. Consider this example, defaults except numP = 1000 and favMuts = 0.0002.



We see our now well-expected decline. Continuing for 100 turns, we see a bit of an improvement, then stagnation.



Stagnation is good. Eventually, we'll see significant, favorable mutations. Here after 200 turns.



This is the same settings only favMuts = 0.0001 and a thousand turns.



It reaches a minimum of 0.12 after 51 turns, then rises to 0.64 after 135 turns. That's five times the minimum. It recrosses fitness of 1 at 402 turns before retreating a bit. Then takes off.

The key is that there has to be a certain amount of diversity in the population to enable the recombinatorial possibilities. If we just consider the fitness from it's minimum, then the closeup of the previous image looks quite different.


Posted by: Dr.GH on July 12 2009,16:40

So, how does Mendel's Accountant compare under the same conditions? Which variables control the difference in output?
Posted by: Zachriel on July 12 2009,17:26

Quote (Dr.GH @ July 12 2009,16:40)
So, how does Mendel's Accountant compare under the same conditions? Which variables control the difference in output?
---------------------QUOTE-------------------


Leaving aside their choice of defaults, Mendel's Accountant is seemingly broke due to a faulty calculation of "working fitness". The resulting fitness signal is very feeble. I'm not even sure why they decided on using the "divide by random" method, though it is a lot faster and somewhat easier to program than Roulette. Two consecutive posts from here:

< http://www.antievolution.org/cgi-bin....y147439 >


DIFFERENCES between Gregor's Bookkeeper and Mendel's Accountant:

 * Multiplicative fitness.
 * Roulette Wheel mating, rather than the odd "divide by random" method.
 * Can handle very large populations and generations—if you're willing to wait.
 * Raised limit on the effect of favorable mutations. Adjusted some other settings.
Posted by: Dr.GH on July 12 2009,17:44

So, you have basically followed Wesley's suggestion of writing your own program.

I still see the same problem as I did before- how do you reply to the creationist challenge that YOU faked the result and not they?" This needs to be explained to someone with no programing experience, nor any sophistication in genetics.

Otherwise, Sanford et al will continue to promote their "refutation" of evolution with out any useful response.
Posted by: Zachriel on July 13 2009,07:16

Quote (Dr.GH @ July 12 2009,17:44)
So, you have basically followed Wesley's suggestion of writing your own program.

I still see the same problem as I did before- how do you reply to the creationist challenge that YOU faked the result and not they?" This needs to be explained to someone with no programing experience, nor any sophistication in genetics.

Otherwise, Sanford et al will continue to promote their "refutation" of evolution with out any useful response.
---------------------QUOTE-------------------


Mendel's Accountant should have replicable results. Gregor's Bookkeeper is relatively simple and open source. That Gregor's Bookkeeper differs should indicate a problem with one or the other. By pointing to a single line of code that apparently breaks Mendel's Accountant, that should satisfy most with open minds.

work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)

Most IDers, though, wallow in fallacies  (of categorization, of argument, of the scientific method; liberally sprinkled with equivocations), so there is rarely a way to convince them.
Posted by: Zachriel on July 13 2009,07:24

Quote (Zachriel @ July 13 2009,07:16)
work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)
---------------------QUOTE-------------------


Even if someone were to < justify > this line of code, it would merely demonstrate the somewhat arbitrary nature of such models. Why ÷Rnd^1? Why not ÷Rnd^½ or ÷Rnd^¾? These models still need to be compared to real biological systems. Any strong conclusion that they debunk evolution is not supported.
Posted by: midwifetoad on July 13 2009,07:55

The press releases for the Lenski experiment assert that the E.coli populations underwent every possible mutation during the course of the experiment. I fail to see how this kind of scenario could lead to extinction.

Except, as noted, in a very small population. But we already know from field observations that population declines can result in extinction. But after mass extinction events, there are always fast breeders of some kind lurking in the wings. Fast breeders never seem to melt down.

Am I wrong?
Posted by: keiths on July 13 2009,11:34

Quote (Zachriel @ July 13 2009,05:16)
Quote (Dr.GH @ July 12 2009,17:44)
So, you have basically followed Wesley's suggestion of writing your own program.

I still see the same problem as I did before- how do you reply to the creationist challenge that YOU faked the result and not they?" This needs to be explained to someone with no programing experience, nor any sophistication in genetics.

Otherwise, Sanford et al will continue to promote their "refutation" of evolution with out any useful response.
---------------------QUOTE-------------------


Mendel's Accountant should have replicable results. Gregor's Bookkeeper is relatively simple and open source. That Gregor's Bookkeeper differs should indicate a problem with one or the other. By pointing to a single line of code that apparently breaks Mendel's Accountant, that should satisfy most with open minds.

work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)

Most IDers, though, wallow in fallacies  (of categorization, of argument, of the scientific method; liberally sprinkled with equivocations), so there is rarely a way to convince them.
---------------------QUOTE-------------------


I haven't been following this thread closely, but if that single line of code is the culprit, it seems that you could make a devastating case by:

1. Compiling Mendel's Accountant as is.
2. Running it on a small but interesting suite of test cases.
3. Changing that line of code and recompiling.
4. Running the same suite of test cases.

By doing so, you would prove:

1. That you haven't faked your results.  Anyone could reproduce them by changing just that one line of code in MA.
2. That MA depends crucially on that line of code.

Anyone (including non-programmers) should be able to see the force of this demonstration.

All that would be left for Sanford et al would be to attempt to justify their version of that single line of code against yours.
Posted by: midwifetoad on July 13 2009,11:47

My reading is that the one line of code would be replaced by a function
Posted by: Richardthughes on July 13 2009,12:20

Would this be useful in teaching about GAs?

< http://www.mathsisfun.com/games/mastermind-game.html >

Watch out for the quasi latching.

I like the fitness function part, but sadly no RM - It's directed.
Posted by: keiths on July 13 2009,16:35

Quote (midwifetoad @ July 13 2009,09:47)
My reading is that the one line of code would be replaced by a function
---------------------QUOTE-------------------


The point is the same either way.  If you publicly specify a small set of changes to the MA code, justify those changes, and demonstrate that MA produces radically different outcomes when those changes are implemented, then you have made a persuasive case against Sanford et al while at the same time proving that you have not rigged the code to get a desired outcome.
Posted by: midwifetoad on July 13 2009,16:59

I'm not disagreeing, but perhaps the MA programmers chose a method because it was easier to program.
Posted by: Zachriel on July 13 2009,18:48

Discovered that that Gregor's Bookkeeper is implementing genetic dominance consistent with Mendel's Accountant.

c...     We assume deleterious alleles behave in a recessive manner
c...     and when heterozygous have an effect given by the allele
c...     fitness effect multiplied by recessive_hetero_expression.
c...     Similarly, we assume favorable alleles behave in a dominant
c...     manner and when heterozygous have an effect given by the
c...     allele fitness effect times dominant_hetero_expression.  The
c...     full allele fitness effect is realized only when the same
c...     version of the allele occurs on both instances of its linkage
c...     block, that is, is homozygous.

Posted by: Dr.GH on July 13 2009,18:50

Quote (keiths @ July 13 2009,09:34)
Quote (Zachriel @ July 13 2009,05:16)
Quote (Dr.GH @ July 12 2009,17:44)
So, you have basically followed Wesley's suggestion of writing your own program.

I still see the same problem as I did before- how do you reply to the creationist challenge that YOU faked the result and not they?" This needs to be explained to someone with no programing experience, nor any sophistication in genetics.

Otherwise, Sanford et al will continue to promote their "refutation" of evolution with out any useful response.
---------------------QUOTE-------------------


Mendel's Accountant should have replicable results. Gregor's Bookkeeper is relatively simple and open source. That Gregor's Bookkeeper differs should indicate a problem with one or the other. By pointing to a single line of code that apparently breaks Mendel's Accountant, that should satisfy most with open minds.

work_fitness(i) = work_fitness(i)/(randomnum(1) + 1.d-15)

Most IDers, though, wallow in fallacies  (of categorization, of argument, of the scientific method; liberally sprinkled with equivocations), so there is rarely a way to convince them.
---------------------QUOTE-------------------


I haven't been following this thread closely, but if that single line of code is the culprit, it seems that you could make a devastating case by:

1. Compiling Mendel's Accountant as is.
2. Running it on a small but interesting suite of test cases.
3. Changing that line of code and recompiling.
4. Running the same suite of test cases.

By doing so, you would prove:

1. That you haven't faked your results.  Anyone could reproduce them by changing just that one line of code in MA.
2. That MA depends crucially on that line of code.

Anyone (including non-programmers) should be able to see the force of this demonstration.

All that would be left for Sanford et al would be to attempt to justify their version of that single line of code against yours.
---------------------QUOTE-------------------


I am hoping Zac thinks this two, because that would make me three!
Posted by: Henry J on July 13 2009,22:08



---------------------QUOTE-------------------
Zachriel, posted 7/13/09 5:48 PM
Discovered that that Gregor's Bookkeeper is implementing genetic dominance consistent with Mendel's Accountant.



---------------------QUOTE-------------------
We assume deleterious alleles behave in a recessive manner [...]
---------------------QUOTE-------------------


---------------------QUOTE-------------------



Why would that be an assumption? I'd expect that a dominant deleterious allele would be selected against, but that should be a direct consequence of the logic, no additional assumption needed.

Similarly, I see nothing to prevent a recessive advantageous allele, except that it wouldn't be spread by selection unless already in a large enough fraction of the population (unless its only partially recessive).

Henry
Posted by: Zachriel on July 14 2009,07:02

Quote (Henry J @ July 13 2009,22:08)
           

---------------------QUOTE-------------------
Zachriel, posted 7/13/09 5:48 PM
Discovered that that Gregor's Bookkeeper is implementing genetic dominance consistent with Mendel's Accountant.



---------------------QUOTE-------------------
We assume deleterious alleles behave in a recessive manner [...]
---------------------QUOTE-------------------


---------------------QUOTE-------------------



Why would that be an assumption? I'd expect that a dominant deleterious allele would be selected against, but that should be a direct consequence of the logic, no additional assumption needed.

Similarly, I see nothing to prevent a recessive advantageous allele, except that it wouldn't be spread by selection unless already in a large enough fraction of the population (unless its only partially recessive).

Henry
---------------------QUOTE-------------------


Some deleterious mutations are dominant. More commonly, you have a pair of alleles that perform some function. If one becomes defective or just less than optimized, as long as the other functions properly, it can pick up the slack, hence dominant. But your point is well-taken. That's why I haven't been happy with the feature, and have it turned off by default (Dominant = 50%).

One alternative is just to have allele-zero be dominant by default. Or a sliding scale. The "natural" alternative is having a variety of different types of allele relationships. That may be beyond what we need here to verify Mendel's claim to have "debunked" evolution.

Quote (keiths @ July 13 2009,11:34)
All that would be left for Sanford et al would be to attempt to justify their version of that single line of code against yours.
---------------------QUOTE-------------------


They need to do that regardless of any independent implementation.

Mendels' Accountant takes genetic fitness, then adds noise resulting in phenotypic fitness. Then divides this by random to produce "working fitness" before truncating. Presumably, this represents the effect of chance on mating success. As shown previously, it cripples the fitness signal and results in a ranking very close to random. An explanation from Mendel's creators would be helpful.

There are many such arbitrary assumptions built into Mendel. That's why I gave up on a direct parallel. It would help if the code was built modularly so that it was easier to read and reconstruct. This is Gregor's basic algorithm.

   For T = 1 To numT
          Mutate
          GeneticFitness
          PhyloFitness
          Selection
          Stats
          Roulette
          Extinction?
   Next T


Mendel's ÷Rnd seems to replace Gregor's Roulette.
Posted by: dvunkannon on July 16 2009,07:27

KF is hoist on his own FCSI


---------------------QUOTE-------------------
102

Nakashima

07/16/2009

7:13 am
Mr KairosFocus,

For practical purposes, once an aspect of a system, process or object of interest has at least 500 – 1,000 bits or the equivalent of information storing capacity, and uses that capacity to specify a function that can be disrupted by moderate perturbations, then it manifests FSCI, thus CSI.

So if I write a GA system where the population members are competitors in an iterated prisoners dilemma with competitions running up to 1000 iteratioins, then you are satisfied that FSCI is being created by the GA? Each member is 1000 bits long, each bit stands for the action to take (cooperate=1, defect=0) in the current iteration. Fitness is the score of the individual at the end of an iterated competition with another member of the population.

---------------------QUOTE-------------------



Granted, that is a crappy way to write an IPD system, but it makes the case.
Posted by: oldmanintheskydidntdoit on July 16 2009,10:47

Quote (dvunkannon @ July 16 2009,07:27)
KF is hoist on his own FCSI
 

---------------------QUOTE-------------------
102

Nakashima

07/16/2009

7:13 am
Mr KairosFocus,

For practical purposes, once an aspect of a system, process or object of interest has at least 500 – 1,000 bits or the equivalent of information storing capacity, and uses that capacity to specify a function that can be disrupted by moderate perturbations, then it manifests FSCI, thus CSI.

So if I write a GA system where the population members are competitors in an iterated prisoners dilemma with competitions running up to 1000 iteratioins, then you are satisfied that FSCI is being created by the GA? Each member is 1000 bits long, each bit stands for the action to take (cooperate=1, defect=0) in the current iteration. Fitness is the score of the individual at the end of an iterated competition with another member of the population.

---------------------QUOTE-------------------



Granted, that is a crappy way to write an IPD system, but it makes the case.
---------------------QUOTE-------------------


He came back with "yeah, but the program itself was written by a intelligent designer therefore ID!"

Pathetic.
Posted by: dvunkannon on July 16 2009,11:42

Quote (oldmanintheskydidntdoit @ July 16 2009,11:47)
Quote (dvunkannon @ July 16 2009,07:27)
KF is hoist on his own FCSI
 

---------------------QUOTE-------------------
102

Nakashima

07/16/2009

7:13 am
Mr KairosFocus,

For practical purposes, once an aspect of a system, process or object of interest has at least 500 – 1,000 bits or the equivalent of information storing capacity, and uses that capacity to specify a function that can be disrupted by moderate perturbations, then it manifests FSCI, thus CSI.

So if I write a GA system where the population members are competitors in an iterated prisoners dilemma with competitions running up to 1000 iteratioins, then you are satisfied that FSCI is being created by the GA? Each member is 1000 bits long, each bit stands for the action to take (cooperate=1, defect=0) in the current iteration. Fitness is the score of the individual at the end of an iterated competition with another member of the population.

---------------------QUOTE-------------------



Granted, that is a crappy way to write an IPD system, but it makes the case.
---------------------QUOTE-------------------


He came back with "yeah, but the program itself was written by a intelligent designer therefore ID!"

Pathetic.
---------------------QUOTE-------------------


Which reply was duly anticipated by Nakashima! Now he is pushed into agreeing with GilD. You can tell KF is out of his comfort zone because his follow-up reply is short. As soon as he figures out how to fit this question into his system, he'll be back to multipoint posts.
Posted by: keiths on July 16 2009,14:27

Quote (dvunkannon @ July 16 2009,09:42)
Quote (oldmanintheskydidntdoit @ July 16 2009,11:47)
Quote (dvunkannon @ July 16 2009,07:27)
KF is hoist on his own FCSI


---------------------QUOTE-------------------
102

Nakashima

07/16/2009

7:13 am
Mr KairosFocus,

For practical purposes, once an aspect of a system, process or object of interest has at least 500 – 1,000 bits or the equivalent of information storing capacity, and uses that capacity to specify a function that can be disrupted by moderate perturbations, then it manifests FSCI, thus CSI.

So if I write a GA system where the population members are competitors in an iterated prisoners dilemma with competitions running up to 1000 iteratioins, then you are satisfied that FSCI is being created by the GA? Each member is 1000 bits long, each bit stands for the action to take (cooperate=1, defect=0) in the current iteration. Fitness is the score of the individual at the end of an iterated competition with another member of the population.

---------------------QUOTE-------------------



Granted, that is a crappy way to write an IPD system, but it makes the case.
---------------------QUOTE-------------------


He came back with "yeah, but the program itself was written by a intelligent designer therefore ID!"

Pathetic.
---------------------QUOTE-------------------


Which reply was duly anticipated by Nakashima! Now he is pushed into agreeing with GilD. You can tell KF is out of his comfort zone because his follow-up reply is short. As soon as he figures out how to fit this question into his system, he'll be back to multipoint posts.
---------------------QUOTE-------------------


Here's a < link > to the exchange.
Posted by: dvunkannon on July 17 2009,07:34

Quote (dvunkannon @ July 16 2009,08:27)
KF is hoist on his own FCSI


---------------------QUOTE-------------------
102

Nakashima

07/16/2009

7:13 am
Mr KairosFocus,

For practical purposes, once an aspect of a system, process or object of interest has at least 500 – 1,000 bits or the equivalent of information storing capacity, and uses that capacity to specify a function that can be disrupted by moderate perturbations, then it manifests FSCI, thus CSI.

So if I write a GA system where the population members are competitors in an iterated prisoners dilemma with competitions running up to 1000 iteratioins, then you are satisfied that FSCI is being created by the GA? Each member is 1000 bits long, each bit stands for the action to take (cooperate=1, defect=0) in the current iteration. Fitness is the score of the individual at the end of an iterated competition with another member of the population.

---------------------QUOTE-------------------



Granted, that is a crappy way to write an IPD system, but it makes the case.
---------------------QUOTE-------------------


Just for giggles, I tried to think through what kind of solution this GA would find. Each iteration is actually independent, there is no use of memory. The population should have 50% of each of Cooperate and Defect at each locus. That should be enough to sweep towards Cooperate at every locus.
Posted by: midwifetoad on July 22 2009,11:05



---------------------QUOTE-------------------
In abiogenesis, there is no fitness function, there are only reaction rates and products. The ‘fitness’ of a molecule depends entirely on its enviroment, which is why a GA like the IPD scenario I proposed is closer to that reality than silly Weasel style functions.
---------------------QUOTE-------------------



Regardless of how complex or "realistic" the fitness function, it all boils down to its effect on the probability of reproduction.

Rather than saying there is no fitness function, it might be better to say that it is difficult to construct a function that is equivalent to chemistry, or to an ecosystem.

Edit to add link:

< http://www.uncommondescent.com/intelli....-327292 >
Posted by: Henry J on July 22 2009,14:27

Quote (midwifetoad @ July 22 2009,10:05)
Rather than saying there is no fitness function, it might be better to say that it is difficult to construct a function that is equivalent to chemistry, or to an ecosystem.
---------------------QUOTE-------------------


Yeah, when the ecosystem includes thousands of other species* that are also evolving (and changing their population sizes) all at the same time, that does "slightly" increase the number of variables the program would have to keep track of.

*Some of which are trying to eat the subject of the experiment (and each other as well), and some of which are trying to avoid being eaten by that subject (and each other).

Henry
Posted by: midwifetoad on July 22 2009,15:31

Well you could try coating the computer with honey and running your sim near an ant bed.

Alternatively, I hear Squirrels are fond of wire insulation.
Posted by: dvunkannon on July 23 2009,21:53

Nakashima whips it out.



---------------------QUOTE-------------------
246

Nakashima

07/23/2009

12:45 pm
PopSize = 1000
IndSize = 1000
MaxTime = 1000
mutationRate = 0.05
allocate Pop[PopSize, IndSize], Fitness[PopSize]
for i = 1, PopSize
for j = 1, IndSize
Pop[i, j] = rnd(0, 1)
next j
Fitness[i] = evaluate(Pop[i, *])
next i

allocate NewInd[1, IndSize]
for t = 1, MaxTime * PopSize
for j = 1, IndSize
p1 = rnd(1, PopSize)
p2 = rdn(1, PopSize)
if Fitness[p1] > Fitness[p2]
then newBit = Pop[p1, j]
else newBit = Pop[p2, j]
if rnd(0,1) < mutationRate
then newBit = not(newBit)
NewInd[1, j] = newBit
next j
p3 = rnd(1, PopSize)
Pop[ p3, *] = NewInd[1, *]
Fitness[p3] = evaluate(Pop[p3, *])
next t

evaluate( Ind )
{

}
---------------------QUOTE-------------------



Not the best code in the world, but ok for lunch time.



---------------------QUOTE-------------------
252

Nakashima

07/23/2009

9:39 pm
Mr Jerry,

I did not know that God was writing comments on this site. Did He also write the code that Nakashima gave us.

While I think you are expressing it somewhat jokingly, this question of where to assign credit (or blame) for FSCI is key. Do we give credit to the first cause or to the last cause? If the first cause, then I understand naming God as the author of the FSCI. If the last cause, then the GA itself is the author of the FSCI, not of itself, but of the Pop data array inside it. (Note that the bits in Pop come from the random function or copied from other places in Pop. The ultimate source of every bit in Pop is random.
---------------------QUOTE-------------------



I will be interested to see how the UD crowd react to that last observation.
Posted by: Zachriel on July 24 2009,06:44

Moved to Uncommon Thread.
Posted by: deadman_932 on July 24 2009,08:21

Between you folks and the people at Theology Web, I think you've scared off Sanford completely.

I salute you.

I'm impressed at the speed that "Mendel's Accountant" got dismembered and turned into pea-potting soil.
Posted by: Dr.GH on July 24 2009,10:07

Quote (deadman_932 @ July 24 2009,06:21)
Between you folks and the people at Theology Web, I think you've scared off Sanford completely.

I salute you.

I'm impressed at the speed that "Mendel's Accountant" got dismembered and turned into pea-potting soil.
---------------------QUOTE-------------------


I agree, congratulations. What has yet to be done is to produce a coherent reply for general readers to the claim that, “Mendel's Accountant provides overwhelming empirical evidence that all of the "fatal flaws" inherent in evolutionary genetic theory are real. This leaves evolutionary genetic theory effectively falsified--with a degree of certainty that should satisfy any reasonable and open-minded person.”

Larry Vardiman
2008. “The "Fatal Flaws" of Darwinian Theory” Acts & Facts. 37 (7): 6. Institute of Creation Research
< http://www.icr.org/article/fatal-flaws-darwinian-theory/ >
Posted by: JonF on July 24 2009,10:11

I agree. While I don't know of anyting I can do to help, I think that a general-audience explanation is important.

I can does HTML ...
Posted by: deadman_932 on July 25 2009,08:44

The broad form that a reply to Sanford should take -- to give the fullest possible refutation of Sanford -- would be to eventually demonstrate how Mendel's Accountant is a deceptive product of Sanford's overall views.

Because Sanford believes that all life on Earth is between 5,000-100,000 years old, Mendel's Accountant essentially cooks the books to arrive at output which is intended to bolster Sanford's claims set out in his book "Genetic Entropy and the Mystery of the Genome."

I think everyone here is aware of that. Mendel's Accountant and Sanford's "Genetic Entropy" book go hand-in-hand

Over at TalkRational, Febble, Vox Rat and others are going through Sanford's book. This can be useful in the future. See < here >,  < here > and < here > keeping in mind that the last two are currently less useful because the discussion really hasn't begun yet -- due to "AF Dave" serving as a foil for Febble. He's really putting off any in-depth discussion because (1) he's an idiot and (2) he's doing what he usually does; use a discussion for propaganda purposes rather than anything difficult like, y'know...learning. Then there's the good discussion at Theology Web. I'll look around and see what else I can find at other BB's

Anyway, all of this has to be brought together at some point to show the pattern of pseudoscience and deception inherent in Sanford's efforts. It's a largish task, but manageable when broken down into parts.
Posted by: midwifetoad on July 25 2009,10:57

So is genetic entropy almost like a prediction of ID, or is it just debris tossed behind to impede pursuit?
Posted by: Bob O'H on July 25 2009,17:08

Shit, I really should write my review of Genetic Entropy.

Shorter version: Sandford ignores (a) multiple genes, and (b) sex.
Posted by: Steve Schaffner on July 25 2009,21:27

[I posted this on TW, and Dr. GH asked me to repost it here. It mostly addresses the genetic model used in MA, rather than the implementation.]


John Sanford wrote me several weeks ago, replying to my previous comments on his model of evolution. I have just replied to his email. Since I do not have permission to quote his words, I tried to make my mail stand on its own as much as possible; if context is not clear, please ask me for clarification. (Or reply to praise my limpid prose style, or to tell me I'm a nitwit, or whatever. I.e. the usual.)

Here is my reply:

Hi John,

Viewed from a high level, populations crash in your model because of several features in the model. First, it has a high rate of very slightly deleterious mutations, ones that have too weak an effect to be weeded out by selection. Second, the accumulation of these mutations reduces the absolute fitness of the entire population. Third, beneficial mutations (and in particular compensating mutations) are rare enough (and remain rare enough even as the fitness declines) and of weak enough effect that they do not counteract the deleterious mutations. As far as I can tell, any model of evolution that has these features will lead to eventual extinction -- the details of the simulation shouldn't matter at this level. (Indeed, Kondrashov pointed out this general problem in 1995; I wouldn't be surprised if others have made the same point earlier.)

So there is no question that if these premises of the model are correct, organisms with modest population sizes (including all mammals, for example) are doomed, and Darwinian evolution fails as an explanation for the diversity of life. If one wishes to conclude that evolution does fail, however, it is necessary to show that all of the premises are true -- not merely that they are possible, but they reflect the real processes occurring in natural populations. From my perspective, that means you need to provide empirical evidence to support each of them, and I don't think you have done so.

Turning specifcially to issue of soft selection: it matters here becuase it severs the connection between relative fitness and absolute population fitness. The essence of soft selection is that the absolute fitness of the population does not change, regardless of the relative fitness effects of individual mutations that accumulate in the population. As Kimura put it, "Therefore, under soft selection, the average fitness of the population remains the same even if the genetic constitution of the population changes drastically. This type of selection does not apply to recessive lethals that unconditionally kill homozygotes. However, if we consider the fact that weak competitors could still survive if strong competitors are absent, soft selection may not be uncommon in nature." (p. 126, The Neutral Theory of Evolution).

(An unimportant point: my understnading from reading Wallace is that he introduced the term "soft selection" in the context of accumulating deleterious mutations (especially concerns about them raised by Jim Crow), not in connection with Haldane's dilemma or the rate of beneficial substitution. If you have a citation that provides evidence otherwise, I would be interested in seeing it. The basic model of soft selection actually goes back at least to Levene in 1953 (predating Haldane's work by a few years), when he was considering the maintenance of varied alleles in a mixed environment. So this is not a new idea, and it is (contra your suggestion) is a well-defined concept, and one that is in fact often considered in the context of deleterious mutations and genetic load. Are there any recent published discussions of genetic load that do not consider soft selection as a possibility?)

In your reply to me, you said that the default in your program is purely soft selection. I don't know what the actual default is for deciding whether fitness affects fertility (since I have not run the program), but the online user manual says that an effect on fertility is in fact the default ("The default value is “Yes”, which means that fertility declines with fitness, especially as fitness approaches zero.") Regardless of the direct effect on fertility, the use of an additive model of fitness means that deleterious selection in your program ultimately ceases to be soft, since accumulating additive fitness always ends up or below zero, at which point the relative fitness values no longer matter. In a model of soft selection, the magnitude of the populations's fitness makes no difference at all; only the relative values of individuals have an effect. In your program, that is not the case. So in practice, your program does not seem to model long-term soft selection.

(As an aside, I'm afraid I don't understand your comments about having tested a multiplicative model of fitness. You say that in such a model, as the mean fitness falls, you see increasing numbers of individuals inherit a set of mutations that give a fitness less than or equal to zero. Under a multiplicative model, the fitness is given by f = (1-s1) * (1-s2) * (1-s3) *..., where s1, s2, s3... are the selection coefficients for the different mutations. If the various s values are less than 1.0 (as they must be if the mutations have been inherited), then f must always be greater than 0. I don't see how you can have a multiplicative model with the reported behavior. Perhaps you have a noise term that is still additive?)

The real question is whether or not soft selection is actually important and needs to be modeled. As you say, soft selection is a mental construct -- but so is hard selection. You dismiss it as a real phenonenon, but do you have any evidence to support your point here? Your populations crash because of very slightly deleterious mutations, and as far as I know, virtually nothing is known about what kind of fitness effects these mutations have. In general, there has been very little empirical work distinguishing soft from hard selection (or equivalently, quantifying the difference between absolute and relative fitness). The only recent study I know of to attempt it looked only at plant defense traits in A. thaliana (Kelley et al, Evolutionary Ecology Research, 2005, 7: 287–302), and they found soft selection effects to be more powerful than hard effects. So I do not see good empirical grounds for rejecting an important role for soft selection.

This isn't to suggest that all selection is soft, or that many mutations don't have real effects on the population fitness -- but there are good theoretical and empirical reasons to think that the net effect of many deleterious mutations is smaller when they are fixed in the population than their relative fitness would suggest. (Not that we actually know what the distribution of relative fitnesses looks like, either. You can pick a functional form for that distribution for the purpose of doing a simulation, but it based on no real experimental evidence. Are deleterious mutations really so highly weighted toward very slight effects? There are just no data available to decide.

If much selection actualy is soft, then humans (and other mammals) could have in their genome millions of deleterious mutations already, the result of hundreds of millions of years of evolution; this is the standard evolutionary model. These mutations would have accumulated as population sizes shrank slowly (relaxing selection) and functional genome sizes grew (increasing the deleterious mutation rate). Indeed, many functional parts of the genome may never have been optimized at all: the deleterious "mutations" were there from the start. The results of this process are organisms that are imperfect compared to a platonic ideal version of the species, but perfectly functional in their own right. In your response, you cite systems biology's assessment that many organisms are highly optimized to counter this possibility. I do not find this persuasive, partly because systems biologists can also cite many features that are suboptimal, but mostly because no branch of biology has the ability to quantify the overall optimization of an organism, or to detect tiny individual imperfections in fitness.

Alternatively, beneficial mutations may be more common and of larger effect than in your default model. I pointed to one recent example of a beneficial mutation with a much larger selective advantage than your model would allow (lactase persistence in human adults). In turn you suggest that such large effects occur only in response to fatal environmental conditions, but the example I gave does not fall in that class. Do you have any empirical evidence that the selective advantage is restricted to such small values?

Michael Whitlock has a nice discussion of this kind of model in a paper from 2000 ("Fixation of new alleles and the extinction of small populations: drift load, beneficial alleles, and sexual selection." (Evolution, 54(6), 2000, pp. 1855–1861.)) His model tries to answer very similar questions to yours. With the choice of parameters that he thinks is reasonable, he finds that only a few hundred individuals are needed to prevent genetic decline.

He also discusses many of the same issues that we're discussing here. For example, on the subject of soft selection he writes, "We also have insufficient information about the relationship between the effects of alleles on relative fitness in segregating populations and their effects on absolute fitness when fixed. Whitlock and Bourguet (2000) have shown that for new mutations in Drosophila melanogaster, there is a positive correlation across alleles between the effects of alleles on productivity (a combined measure of the fecundity of adults and the survivorship of offspring) and male mating success. This productivity score should reflect effects of alleles on mean fitness, but the effects of male mating success are relative. Without choice, females will eventually mate with the males available, but given a choice the males with deleterious alleles have a low probability of mating. Other studies on the so-called good-genes hypothesis have confirmed that male mating success correlates with offspring fitness (e.g., Partridge 1980; Welch et al. 1998; see Andersson 1994)."

His conclusion about his own model strikes me as equally appropriate to yours: "We should not have great confidence in the quantitative values of the predictions made in this paper. In addition to the usual concern that the theoretical model may not include enough relevant properties of the system (e.g., this model neglects dominance and interlocus interactions, the Hill-Robertson effect, the effects of changing environments), the empirical measurements of many of the most important genetic parameters range from merely controversial to nearly nonexistent."

Using this kind of model to explore what factors might be important in evolution is fine, but I think using them to draw conclusions about the viability of evolution as a theory is quite premature.
Posted by: Dr.GH on July 26 2009,00:14

Thanks, Steve. Your post, the observations and graphics by Zachriel, plus the graphics by Sam (AKA Ansgar Seraph) should add up to a solid refutation.

This is not my area, so if I missed anyone else's solid contribution- I apologize now rather than later.
Posted by: MichaelJ on July 26 2009,06:54

Quote (deadman_932 @ July 25 2009,08:44)
The broad form that a reply to Sanford should take -- to give the fullest possible refutation of Sanford -- would be to eventually demonstrate how Mendel's Accountant is a deceptive product of Sanford's overall views.

Because Sanford believes that all life on Earth is between 5,000-100,000 years old, Mendel's Accountant essentially cooks the books to arrive at output which is intended to bolster Sanford's claims set out in his book "Genetic Entropy and the Mystery of the Genome."

I think everyone here is aware of that. Mendel's Accountant and Sanford's "Genetic Entropy" book go hand-in-hand

Over at TalkRational, Febble, Vox Rat and others are going through Sanford's book. This can be useful in the future. See < here >,  < here > and < here > keeping in mind that the last two are currently less useful because the discussion really hasn't begun yet -- due to "AF Dave" serving as a foil for Febble. He's really putting off any in-depth discussion because (1) he's an idiot and (2) he's doing what he usually does; use a discussion for propaganda purposes rather than anything difficult like, y'know...learning. Then there's the good discussion at Theology Web. I'll look around and see what else I can find at other BB's

Anyway, all of this has to be brought together at some point to show the pattern of pseudoscience and deception inherent in Sanford's efforts. It's a largish task, but manageable when broken down into parts.
---------------------QUOTE-------------------


Febble has to be one of the clearest and most logical writers I have come across. afDave hasn't changed at all.
Posted by: dvunkannon on Aug. 01 2009,14:42

I saw Nilsson and Pelger come up again on UD. Is there really no eye evolution computer simulation on the web?
Posted by: dvunkannon on Sep. 09 2009,16:37

Here's a little something related to genetic algorithms that I was thinking about during my recent vacation. I decided to write it down and share it with y'all in the hope getting some feedback on the idea. Now that I've got the idea sketched out, I'll implement it in the little GA I've been building.

Thanks in advance for any comments.



---------------------QUOTE-------------------
The Valley of the Demes
An Island Model GA with Asymmetric Topology, Parameters, and Migration Policy

Overview

The Valley Model ("Valley of the Demes") consists of several changes to the standard Island Model for deme-structured GAs. The biological inspiration for the Valley Model is the typical alpine valley. The fertile central valley is connected to a branching network of side valleys in which conditions are harsher and more challenging.
While deme-structured GAs offer the opportunity to take advantage of multiple CPUs, there is also some evidence that the separation into sub-populations itself helps to maintain divesity and slow premature convergence. The Valley Model attempts to take advantage of, and to encourage, this diversity in a variety of ways that make sense from a general EA point of view, and are still consistent with a specific ecological metaphor.

Topology
The demes are connected in the topology of a truncated Bethe lattice. The central deme has three connections to demes in the next ring outwards. Each of these demes has two demes connected to it in the next most outward ring, and so on until the last ring. The total number of rings is a model parameter.
The number of demes in each ring is 1, 3, 6, 12, 24,... and the total number of demes is 1, 4, 10, 22, 46,...
In the valley metaphor, the side valleys are connected downwards to the central plain, but not to neighboring side valleys.

Deme Sizing
The central deme has an initial (and average, if varying) carrying capacity of half the total population. Each ring outward holds half of the remaining population. For example, a model with four rings would allocate the population carrying capacities between the rings as 1/2, 1/4, 1/8, 1/8. However, because the number of demes is growing in each ring, the per deme carrying capacities would be 1/2, 1/12, 1/48, 1/96. Obviously, it would not make sense to run this four ring model with a total population less than 192 if all spaces are intended to be occupied, and the smallest deme has a carrying capacity of two.
In the valley metaphor, the carrying capacity of each deme is reduced by altitude, less space and poorer resources.

Deme Parameters
While crossover rates are expected to be constant across demes, the mutation rate is higher in each ring outward from the center.
In the valley metaphor, this could be attributed to more cosmic rays at higher altitudes but the major reason for raising the mutation rate is to give the model a chance to generate diversity.

Migration Policy
At the end of each 'generation', the population of a deme must be reduced to its current carrying capacity. (Carrying capacity can be parametric in time, for each deme independently.) In a typical GA, losers of some selection process (or the entire old generation) die. In the Valley Model, these poor performers are instead exported to demes one ring outwards. Only in the last ring of demes do losers die.
(The above assumes that migration takes place every generation. If migration only occurs every few generations, as in many Island Models, over capacity populations in the central demes would have to cull individuals.)
Except for the central deme, demes may also attempt to reduce their population to the carrying capacity level by sending the best of population inwards, replacing some individual in the receiving deme.
While this inflow to the center is a 'best replaces worst', the outward migration is not, so the overall effect should not raise selection pressures. Also, the outward flow is based on population carrying capacity and population growth, while the inflow is based on the topology. If the generation size was small, close to a Steady State GA, these flows would balance.
In the valley metaphor, poor performers are pushed out of the fertile territory and forced to move higher upslope by overcrowding. The only hope to move back towards the lower ground is to take someone else's place.

Initialization
To take advantage of founder effects, the model can be initialized so that only the central deme contains population members at time 0. In this way, several generations will pass before any population member is actually removed from the model.
In the valley metaphor, this initialization choice parallels the colonization of the valley for the first time by population members from elsewhere.

Summary
The Valley Model is intended to explore areas of GA model design in which model asymmetries help to preserve diversity in the population. The resulting diversity may support discovery of multiple solutions (niching) or simply avoid premature convergence, thereby improving the solution quality or speed to solution metrics.
The specific choice of asymmetries in the Valley Model is inspired by the ecologies and dynamics of real world mountain valleys.

---------------------QUOTE-------------------


Posted by: Turncoat on Oct. 10 2009,14:12

DiEb and I want to get serious about identifying errors in the IEEE SMC-A article of Dembski and Marks. Do you folks feel that the discussion deserves its own thread?
Posted by: Turncoat on Oct. 10 2009,15:42

Quote (dvunkannon @ Sep. 09 2009,16:37)
Here's a little something related to genetic algorithms that I was thinking about during my recent vacation. I decided to write it down and share it with y'all in the hope getting some feedback on the idea. Now that I've got the idea sketched out, I'll implement it in the little GA I've been building.

Thanks in advance for any comments.
---------------------QUOTE-------------------


Something few people understand about the NFL theorems is that they level algorithms in terms of how well they do with n evaluations of the fitness function, and not how well they do with t units of running time. If you do not have prior knowledge of a problem instance that leads you to prefer some algorithm over others, or you choose to ignore that knowledge, then select the algorithm that runs the fastest (i.e., completes the greatest number of fitness evaluations in available running time).

You'll generally do better letting the computer architecture drive the choice of algorithm than by choosing the algorithm and trying to get its parameters right.

Back in 1991, before I'd hit on the NFL rationale (1994), I was working with a SIMD architecture with a toroidal mesh of processing elements. While giving a course on GA's, I designed a GA to run really, really fast on that architecture, abandoning bio-mimicry when necessary, and handed off the programming to two doctoral students. One of the students -- the one who subsequently presented our work at a conference -- liked to speak of my "big, messy genetic algorithm." The phrase was not in our paper, but it evidently made its way to Goldberg. There has been a fair number of BMGA papers to come from his lab.

The size of demes on individual processing elements was determined entirely by available memory. The parameter we played with most was the number of generations between migrations of individuals between neighboring processing elements. (I can't remember how we selected the migrants.) We thought, buying into Goldberg, that migration and recombination were essential. There is actually a theoretical argument for isolated parallel runs. So nowadays I would leave out migration unless I had reason to believe it would be beneficial, even though our program did not spend much time on it.

The program blazed, in terms of wall-clock time, and this was due to exploitation of the architecture. My students also implemented and studied many existing GA's, and we discovered just how much unreported parameter tweaking underlay reported results. There is no magic in any form of evolutionary computation, and exploiting the architecture should be Job One unless you have prior knowledge of the problem instance to exploit.
Posted by: dvunkannon on Oct. 10 2009,16:12

Quote (Turncoat @ Oct. 10 2009,15:12)
DiEb and I want to get serious about identifying errors in the IEEE SMC-A article of Dembski and Marks. Do you folks feel that the discussion deserves its own thread?
---------------------QUOTE-------------------


I'd prefer for that discussion to happen here. Special purpose threads are hard to find (at least for me). I prefer general threads that pick up topics as necessary. Viz. the discussion of Mendel's Accountant and previous Weasel discussions on this thread.
Posted by: Turncoat on Oct. 10 2009,21:31

Quote (Turncoat @ Oct. 10 2009,15:42)
There is actually a theoretical argument for isolated parallel runs. So nowadays I would leave out migration unless I had reason to believe it would be beneficial, even though our program did not spend much time on it.
---------------------QUOTE-------------------


I should clarify that a bit. I would not run isolated GA's with small populations, high crossover rates, and low mutation rates in parallel. There would be a lot of wasteful reevaluation of individuals. I would run isolated algorithms with relatively high mutation rates in parallel. With high mutation rates, few individuals are evaluated more than once.
Posted by: dvunkannon on Oct. 10 2009,22:17

Quote (Turncoat @ Oct. 10 2009,16:42)
Quote (dvunkannon @ Sep. 09 2009,16:37)
Here's a little something related to genetic algorithms that I was thinking about during my recent vacation. I decided to write it down and share it with y'all in the hope getting some feedback on the idea. Now that I've got the idea sketched out, I'll implement it in the little GA I've been building.

Thanks in advance for any comments.
---------------------QUOTE-------------------


Something few people understand about the NFL theorems is that they level algorithms in terms of how well they do with n evaluations of the fitness function, and not how well they do with t units of running time. If you do not have prior knowledge of a problem instance that leads you to prefer some algorithm over others, or you choose to ignore that knowledge, then select the algorithm that runs the fastest (i.e., completes the greatest number of fitness evaluations in available running time).
---------------------QUOTE-------------------


Taken to the extreme, that says just run "Generate and Test" and give up on this fancy selecting, sorting, etc. NFL is NFL.

However, I think we agree that most human interesting problems are in a class where some level of GA can help if we don't already know a closed form solution. I realize there is a view that demes are just to allocate hardware appropriately, but I've also seen research that avoiding panmixis is a benefit irrespective of hardware. YMMV.
Posted by: midwifetoad on Oct. 11 2009,07:14

There are only about half a dozen threads that maintain visibility here. so I vote for keeping the number down, and keeping them general.
Posted by: Turncoat on Oct. 11 2009,23:36

Anyone else notice the attributed oxymoron here?
   

---------------------QUOTE-------------------
English’s Law of Conservation of Information (COI) [15] notes “the futility of attempting to design a generally superior optimizer” without problem-specific information about the search.
---------------------QUOTE-------------------


If you're attempting to design a generally superior optimizer, you are not looking at problem-specific information.

Mighty rude to complain about being made infamous, ain't it?
Posted by: Turncoat on Oct. 11 2009,23:50

Quote (dvunkannon @ Oct. 10 2009,22:17)
Quote (Turncoat @ Oct. 10 2009,16:42)
 
Quote (dvunkannon @ Sep. 09 2009,16:37)
Here's a little something related to genetic algorithms that I was thinking about during my recent vacation. I decided to write it down and share it with y'all in the hope getting some feedback on the idea. Now that I've got the idea sketched out, I'll implement it in the little GA I've been building.

Thanks in advance for any comments.
---------------------QUOTE-------------------


Something few people understand about the NFL theorems is that they level algorithms in terms of how well they do with n evaluations of the fitness function, and not how well they do with t units of running time. If you do not have prior knowledge of a problem instance that leads you to prefer some algorithm over others, or you choose to ignore that knowledge, then select the algorithm that runs the fastest (i.e., completes the greatest number of fitness evaluations in available running time).
---------------------QUOTE-------------------


Taken to the extreme, that says just run "Generate and Test" and give up on this fancy selecting, sorting, etc. NFL is NFL.

However, I think we agree that most human interesting problems are in a class where some level of GA can help if we don't already know a closed form solution. I realize there is a view that demes are just to allocate hardware appropriately, but I've also seen research that avoiding panmixis is a benefit irrespective of hardware. YMMV.
---------------------QUOTE-------------------


My main point was that hardware considerations must sometimes take precedence over algorithmics.

When we get into the Dembski and Marks article, you'll see that I believe we have learned a lot about problems, and that D&M ignore this source of information in their comments about evolutionary optimizers. I don't believe that their "search for a search" regress of probability measures models our learning through experience.

It seems to me that you have a metaheuristic, and that you would instantiate generic operations differently for different problems, presumably exploiting knowledge of the problem. It is not merely a fine theoretical point that different instantiations give different algorithms.
Posted by: Dr.GH on Oct. 12 2009,09:48

Quote (Turncoat @ Oct. 11 2009,21:36)
Anyone else notice the attributed oxymoron here?
   

---------------------QUOTE-------------------
English’s Law of Conservation of Information (COI) [15] notes “the futility of attempting to design a generally superior optimizer” without problem-specific information about the search.
---------------------QUOTE-------------------


If you're attempting to design a generally superior optimizer, you are not looking at problem-specific information.

Mighty rude to complain about being made infamous, ain't it?
---------------------QUOTE-------------------


HehHeh
Nicely put.
Posted by: DiEb on Oct. 13 2009,13:40

I'm mainly interested in the evolutionary algorithms used in W. Dembski's and R. Marks's paper Conservation of Information in Search - Measuring the Cost of Success, i.e., the examples E and F in section III examples of active information in search. I tried to gather my thoughts, at first in my < blog >, but now on < this page > of <a href="http//rationalwiki.com" target="_blank">rationalwiki</a>: This wiki allows for math-tags, and - of course - for  collaboration. I'd love to get some input/critique/reactions...
Posted by: Bob O'H on Oct. 13 2009,13:55



---------------------QUOTE-------------------
I don't believe that their "search for a search" regress of probability measures models our learning through experience.
---------------------QUOTE-------------------


I recall from a previous incarnation of this idea (when wMad assumed, for consistency's sake, that q<p and proved, 4 pages later, that log(q)<log(p)) I thought a nice metaphor would be that Dembski was saying it's easier to find a needle in a haystack than to find that huge electromagnet in the shed next to the haystack.
Posted by: midwifetoad on Oct. 13 2009,14:08

I can't evaluate the math, but I think people are being led down a garden bath with the concept of search.

I don't see any evidence that biology is modeled by a search algorithm. In biology a change might affect fitness for unknown reasons, and a subsequent change to the same position might further affect fitness. There is no single correct value for any given position in the string.

Behe and Dembski want you to believe that evolution must progress toward goals (consider Behe's obsession with the flagellum), but biology merely chugs along with whatever is adequate.
Posted by: Turncoat on Oct. 13 2009,14:15

Quote (Bob O'H @ Oct. 13 2009,13:55)


---------------------QUOTE-------------------
I don't believe that their "search for a search" regress of probability measures models our learning through experience.
---------------------QUOTE-------------------


I recall from a previous incarnation of this idea (when wMad assumed, for consistency's sake, that q<p and proved, 4 pages later, that log(q)<log(p)) I thought a nice metaphor would be that Dembski was saying it's easier to find a needle in a haystack than to find that huge electromagnet in the shed next to the haystack.
---------------------QUOTE-------------------


Neigh-h-h-h.


Posted by: Turncoat on Oct. 13 2009,14:46

Quote (midwifetoad @ Oct. 13 2009,14:08)
I can't evaluate the math, but I think people are being led down a garden bath with the concept of search.

I don't see any evidence that biology is modeled by a search algorithm. In biology a change might affect fitness for unknown reasons, and a subsequent change to the same position might further affect fitness. There is no single correct value for any given position in the string.

Behe and Dembski want you to believe that evolution must progress toward goals (consider Behe's obsession with the flagellum), but biology merely chugs along with whatever is adequate.
---------------------QUOTE-------------------


I agree entirely. I raised the issue of whether optimization was a good model of biological evolution in my first NFL paper, back in 1996. Now I am completely convinced that it is not. As Allen MacNeill rightly emphasizes, the consequence of variety, heredity, and fecundity is demography. A novel biological type can survive by virtue of its difference from the type that gave rise to it. There is not necessarily any basis for saying that the difference makes it better or worse.

Dembski and Marks have indicated that there are "implicit targets" for biological search. It's hilarious that they smuggle in teleology while accusing others of smuggling information into computational models of biological evolution. Creationists have long mistaken what did happen for what had to happen.
Posted by: midwifetoad on Oct. 13 2009,15:00



---------------------QUOTE-------------------
I agree entirely. I raised the issue of whether optimization was a good model of biological evolution in my first NFL paper, back in 1996. Now I am completely convinced that it is not.
---------------------QUOTE-------------------



I think fitness is a useful concept, as long as you don't conflate it with correctness.

BTW, bath=path, for more fitness.
Posted by: Dr.GH on Oct. 13 2009,15:17

Quote (Turncoat @ Oct. 13 2009,12:46)
Creationists have long mistaken what did happen for what had to happen.
---------------------QUOTE-------------------


I like that a lot. Coupled with evolutionary ratcheting, it makes a big punch-out to creationist arguments.

Over course, religionists can all believe that their god(s) knew how every thing would turn out all the while. Except they never seem to be able to figure it out.
Posted by: Turncoat on Oct. 13 2009,15:50

In every introductory AI lecture I gave, I asked my students, "Which are more intelligent, cats or dogs?" The question is just as silly with "fit" in place of "intelligent."

I've read an interesting article by Sober on the role of reproductive fitness in biological modeling. As I recall, he indicates that its main use is in modeling at the level of a few alleles and traits (i.e., in population genetics). I am not sure that the concept is necessarily tautological at the level of whole organisms, but I definitely have seen many folks slip into tautology.

Creationist arguments that bacteria gain antibiotic resistance through a decrease in "real" fitness (i.e., the environment outside the hospital, in which the antibiotic-resistant strain fares poorly, is more real than the environment inside the hospital) are silly. And they go "poof" with emphasis on demography, as opposed to fitness.
Posted by: DiEb on Oct. 13 2009,19:39

I completed my little project on the evolutionary algorithms in Dembski's and Marks paper < here >. Completed? Well, it's a complete draft :-)
One little insight: I suppose that one author  wrote 2) Optimization by Mutation and the other 3) Optimization by Mutation With Elitism - it's virtually the same, just with another notation, so that they didn't spot what they were doing....
I'd bet that Marks wrote 2) - it feels a little bit more rigorous
Posted by: Dr.GH on Oct. 13 2009,22:55

Dang. I thought I had a thought, but I forgot. Glad to see you are all happy.

(Actually, I think that a GA needs to replicate the HW equilibrium data on a real population to defeat Sanford).
Posted by: DiEb on Oct. 19 2009,05:24

R. Marks and W. Dembski try to get a new paper through peer-review. In a podcast, W. Dembski < announced it > as

We have some powerful results that follow up on this paper [Conservation of Information in Search:Measuring the Cost of Success]. This is a paper called "The Search for the Search" which is coming out. It should be out now, but there is some delay in the journal's publishing vent[?]

You can find a draft < here >.
Posted by: DiEb on Oct. 21 2009,01:09

I stated my problems with Dembski's and Marks's new paper The Search for a Search < here >. Don't worry, I kept it entirely non-technical :-)
Posted by: DiEb on Oct. 22 2009,07:42

I tried to post the following at < UncommonDescent >:

The Horizontal No free Lunch Theorem doesn't work for a search of length m > 1 for a target T ? ?:

Let ? be a finite search-space, T ? ? the target - a non-empty subset of ?. A search is a (finite) sequence of ?-valued random variables (?1, ?2, ..., , ?m). A search is successful, if ?n ? T for one n, 1 ? n ? m.

I suppose we do agree here. Now, we look at a search ? as a ?m-valued random variable, i.e., ? := (?1, ?2, ..., , ?m).

When is it successful? If we are still looking for a T ? ? we can say that we found T during our search if

? ? ?m / (? / T)m

Let's define ? as the subspace of ?m which exists from the representations of targets in ?, i.e.,

? := {?m / (? / T)m|T non-empty subset of ?}

Obviously, ? is much smaller than ?m.

But this ? is the space of feasible targets. And if you take an exhaustive partition of ? instead of ?m in Theorem III.1 Horizontal No Free Lunch, you'll find that you can indeed have positive values for the active entropy as defined in the same theorem.

But that's not much of a surprise, as random sampling without repetition works better than random sampling with repetition.

But if you allow T to be any subset of ?m, your results get somewhat trivial, as you are now looking at m independent searches of length 1 for different targets.

The searches which you state as examples in this paper and the previous one all work with a fixed target, i.e., elements of ?. You never mention the possibility that the target changes between the steps of the search (one possible interpretation of taking arbitrary subsets of ?m into account).

So, I'm facing two possibilities:

  1. You didn't realize the switch from stationary targets to moving ones when you introduced searching for an arbitrary subset of ?m
  2. You realized this switch to a very different concept, but chose not to stress the point.
Posted by: DiEb on Oct. 22 2009,07:45

Heck, in the preview, all the nice Omegas, phis, etc. appeared! So, please, insert 16 X Omega, 7 X phi, 2 X Theta and 4 X Phi...
Posted by: DiEb on Oct. 22 2009,07:46

..or look < here >.
Posted by: Dr.GH on Oct. 22 2009,10:45

Thnaks for your effort and work.
Posted by: DiEb on Oct. 22 2009,11:23

I hope I can get some people interested in this paper of W. Dembski and R. Marks - before it is published and has this majesty of peer review attached to it (William Dembsky in an < interview > with Casey Luskin)

BTW: As a follow-up to their previous paper, the sensible thing would be to get it published in the same journal. But, seemingly, there have been problems...
Posted by: DiEb on Nov. 09 2009,04:37

It took a while, but now I think my trashing of W. Dembski's and R. Marks's new article "< The Search for a Search >" (currently - and I suppose indefinitely -  held in peer-review) is fairly complete. < See for yourself >. Any idea of an improvement is welcome!
Posted by: deadman_932 on Nov. 09 2009,05:12

< http://marksmannet.com/RobertMarks/REPRINTS/short/S4S.pdf > is now a dead link. Surprise, wot?
Posted by: DiEb on Nov. 09 2009,05:44

Wow, it disappeared from their < list of publications > at the Evolutionary Informatics Lab, too...
Posted by: deadman_932 on Nov. 09 2009,05:50

Yep, I noticed that. Marky-Mark's been watching.
Posted by: DiEb on Nov. 09 2009,07:25

This seems to be quite typical: They still have the < announcement > for the article "A Search for a Search" on the Evolutionary Informatics Lab site, but they don't link to the actual article as they did < earlier >. Instead, they link to the article "Conservation of Information in Search" - wherein one cannot find the announced results, i.e., the vertical and the horizontal no free lunch theorems...
Posted by: Dr.GH on Nov. 09 2009,14:19

Quote (DiEb @ Nov. 09 2009,02:37)
It took a while, but now I think my trashing of W. Dembski's and R. Marks's new article "< The Search for a Search >" (currently - and I suppose indefinitely -  held in peer-review) is fairly complete. < See for yourself >. Any idea of an improvement is welcome!
---------------------QUOTE-------------------


The problem with making your work public is that Dembski and Marks will basically steal your work. Dembski had bragged that he posts his papers to take advantage of his critics.
Posted by: Erasmus, FCD on Nov. 09 2009,14:28

yeah like the pivot man takes advantage of the glomming hordes
Posted by: deadman_932 on Nov. 10 2009,16:35

Dembski seems to be studiously ignoring your comments and questions at UD, DiEb -- at least publically.

Behind-the-scenes seems another matter. I expect he and Marks are brainstorming about how or if they can salvage what's left.
Posted by: DiEb on Nov. 11 2009,00:51

I tried it again, on the thread < ID and Science Education >:



---------------------QUOTE-------------------


2
DiEb
11/11/2009
1:45 am
Your comment is awaiting moderation.

Articles like W. Dembski's and R. Marks's "< The Search for a Search >" will  shape the view of Intelligent Design in general and help us to judge whether it is a scientific theory. So, when will it be published? Any < problems >?
---------------------QUOTE-------------------


Posted by: Richardthughes on Nov. 11 2009,01:07

Quote (DiEb @ Nov. 11 2009,00:51)
I tried it again, on the thread < ID and Science Education >:

 

---------------------QUOTE-------------------


2
DiEb
11/11/2009
1:45 am
Your comment is awaiting moderation.

Articles like W. Dembski's and R. Marks's "< The Search for a Search >" will  shape the view of Intelligent Design in general and help us to judge whether it is a scientific theory. So, when will it be published? Any < problems >?
---------------------QUOTE-------------------


---------------------QUOTE-------------------


It's been deleted.

ETA - or probably not approved yet.
Posted by: DiEb on Nov. 11 2009,02:45

I'm used to extensive periods of moderation at Uncommon Descent, but only one of my comments over the last weeks didn't appear at last.

Granted, some delays were so long that a discussion was made rather impossible...
Posted by: midwifetoad on Nov. 11 2009,06:18



---------------------QUOTE-------------------
Many searches are needle-in-the-haystack problems, looking for small targets in large spaces. In such cases, blind search can stand no hope of success.
---------------------QUOTE-------------------



From Dembski's abstract.


I keep wondering how this line of argument is relevant to evolution. There are some rare instances in which populations "need" to find solutions to changing conditions. The record of extinction suggests this "search" is not generally successful.

Behe and Dembski suggest that structures such as flagella are the end result of a search, but I am no aware of any biologist who thinks flagella ever constituted a goal. As with many other biological structures, they happened.

But it is a monumental intellectual fraud to suggest that specific structures are sought after by biological evolution.
Posted by: DiEb on Nov. 11 2009,07:26

Quote (midwifetoad @ Nov. 11 2009,06:18)
 

---------------------QUOTE-------------------
Many searches are needle-in-the-haystack problems, looking for small targets in large spaces. In such cases, blind search can stand no hope of success.
---------------------QUOTE-------------------



From Dembski's abstract.


I keep wondering how this line of argument is relevant to evolution. There are some rare instances in which populations "need" to find solutions to changing conditions. The record of extinction suggests this "search" is not generally successful.

Behe and Dembski suggest that structures such as flagella are the end result of a search, but I am no aware of any biologist who thinks flagella ever constituted a goal. As with many other biological structures, they happened.

But it is a monumental intellectual fraud to suggest that specific structures are sought after by biological evolution.
---------------------QUOTE-------------------


I'm only a mathematician, so I just wanted to show that Marks and Dembski are wrong according to the rules of their own world, not that their world is just a la-la-fantasy-land...

But a thought: I get the impression that (intelligent design) creationists believe in the power of the word, all words, literally. So, if someone calls some algorithm a evolutionary strategy, it has to be linked with the Theory of Evolution in general, and burning the effigy means destroying the real thing.

Or look at the information shell game: all different kinds (Shannon, Fisher,...) may be used as aspects of the real information, like the Father, the Son, and the Holy Ghost are aspects of the real god. Of course, an unbeliever will get that always wrong...
Posted by: Zachriel on Nov. 16 2009,10:51

This is based on a modification of < Gregor's Bookkeeper >.

An original isogenic population with a fitness of one. Mutation is considered damage to an allele. If both alleles are damaged, then the fitness for the entire organism is zero. Otherwise, we can adjust the effect. With a slight deleterious effect of 0.1 per gene, the population is more stable than if the damage is completely hidden.

Here is a typical scenario, each with a stable population after a hundred generations.

Population = 100
Offspring per individual = 2
Recessive = -0.1
Average Fitness = 0.62

Population = 100
Offspring per individual = 1.5
Recessive = -0.1
Average Fitness = 0.41

Population = 200
Offspring per individual = 1.5
Recessive = -0.1
Average Fitness = 0.53


No beneficial mutations, no variance, no phylogenetic noise. Doesn't change the overall result anyway.

Population = 100
Offspring per individual = 2
Recessive = -0.0
Average Fitness = 0.58


With silent recessives (i.e. Recessive = -0.0), the Offsprings have to be at least ~1.9 per individual for a stable population.

-
Xposted to < Telic Thoughts >.
Xposted to < Uncommon Descent >. Hey! It showed up!!!!

Posted by: Zachriel on Nov. 16 2009,14:22

Interesting comparison, after 1000 generations. (The previous examples used 20 genes. This is with 100 genes.)

Population = 100
Offspring per individual = 2.5
Recessive = 0
Average Fitness = 0.58


Average Fitness refers to average fitness of the children, so of 250 children, 58% have a fitness of one, more than enough to replenish the population of 100. The remaining 42% have a fitness of zero.

Population = 100
Offspring per individual = 2.5
Recessive = -0.001
Average Fitness = 0.98


The majority of children have a fitness between 0.993 and 0.999. Only a handful were stillborn.

Notice the large difference in average fitness due to the very slight effect of the recessive allele.

-
"Ah, I don't have to outrun the bear, dear Albert," said Niels. "I only have to outrun you."
Posted by: DiEb on Nov. 17 2009,17:28

Dembski didn't like it when I discussed Section I - III of his article The Search for a Search at < RationalWiki >. So, I just added a short paragraph on < Section IV >: IMO, Dembski's and Marks's formulation of the integral they use in their Theorem IV: Conservation of Uniformity doesn't make sense...
Posted by: carlsonjok on Nov. 17 2009,20:53

Quote (DiEb @ Nov. 17 2009,17:28)
Dembski didn't like it when I discussed Section I - III of his article The Search for a Search at < RationalWiki >. So, I just added a short paragraph on < Section IV >: IMO, Dembski's and Marks's formulation of the integral they use in their Theorem IV: Conservation of Uniformity doesn't make sense...
---------------------QUOTE-------------------


Out of curiousity, has any response to their IEEE article "Conservation of Information in Search" been submitted.  I thought you were working with Tom English on one.
Posted by: DiEb on Nov. 19 2009,14:13

Quote (carlsonjok @ Nov. 17 2009,20:53)
 
Quote (DiEb @ Nov. 17 2009,17:28)
Dembski didn't like it when I discussed Section I - III of his article The Search for a Search at < RationalWiki >. So, I just added a short paragraph on < Section IV >: IMO, Dembski's and Marks's formulation of the integral they use in their Theorem IV: Conservation of Uniformity doesn't make sense...
---------------------QUOTE-------------------


Out of curiousity, has any response to their IEEE article "Conservation of Information in Search" been submitted.  I thought you were working with Tom English on one.
---------------------QUOTE-------------------


I was a little bit preoccupied with the whole "The Search for a Search", but, thankfully, Tom English is doing all the heavy lifting ;-)
Posted by: midwifetoad on Nov. 21 2009,19:23

Possibly off topic, but maybe not...

< http://pi.nersc.gov/ >
Posted by: Wesley R. Elsberry on Dec. 12 2009,03:39

From the Ewert, Dembski, and Marks < paper on Avida >:

 

---------------------QUOTE-------------------

If the fitness of the organism
does not decrease, we keep the mutation and repeat the
iteration. If the fitness does increase, the mutation is
discarded and the process repeated.

---------------------QUOTE-------------------



What is this, the almost completely neutral evolution strategy?

It's the middle of the night, and maybe my eyes just are too glazed over, but can anybody else see why all the R strategy values for successes in Table III are the same, 353?
Posted by: Kristine on Dec. 12 2009,04:13

Quote (Wesley R. Elsberry @ Dec. 12 2009,03:39)
From the Ewert, Dembski, and Marks < paper on Avida >:

       

---------------------QUOTE-------------------

If the fitness of the organism
does not decrease, we keep the mutation and repeat the
iteration. If the fitness does increase, the mutation is
discarded and the process repeated.

---------------------QUOTE-------------------



What is this, the almost completely neutral evolution strategy?

It's the middle of the night, and maybe my eyes just are too glazed over, but can anybody else see why all the R strategy values for successes in Table III are the same, 353?
---------------------QUOTE-------------------


I don't know, but I did notice that under Table III
   

---------------------QUOTE-------------------
ALL SIMULATIONS, EXCEPT Q, HAVE 353 RUNS
---------------------QUOTE-------------------


with the same values for the random strategies - but this does not match Marks' < comparable slide >


Which now gives avida strategy values in place of the random strategy values cited in the paper. Huh? What is going on?
Posted by: Quack on Dec. 12 2009,06:02



---------------------QUOTE-------------------
But a thought: I get the impression that (intelligent design) creationists believe in the power of the word, all words, literally. So, if someone calls some algorithm a evolutionary strategy, it has to be linked with the Theory of Evolution in general, and burning the effigy means destroying the real thing.
---------------------QUOTE-------------------


At the other end of the spectrum we have a nincompoop like Ray Martinez employing a similar method. It is all about words, and using his own definitions of words and terms to create a fantasy world of evidence why evolution is false.

Ignorance and denial of reality is a prerequisite for all flavors of creationism.
Posted by: Turncoat on Dec. 12 2009,17:34

I'm sorry, folks, but I have trouble attending to the minutiae when ED&M are fundamentally wrong in their orientation. From Wikipedia: "Avida is an artificial life software platform to study the evolutionary biology of self-replicating and evolving computer programs (digital organisms)." As I recall, the ability of a virtual organism to reproduce itself depends on how much work it can do, which is determined by its fitness. Different organisms must do different amounts of work to reproduce. Right?

Avida, with the fitness function addressed by ED&M, does not engage in optimization, let alone search. The virtual organisms are operating in a virtual environment offering different payoffs for different behaviors. The  population of virtual critters in fact changes to obtain greater payoffs, but this does not mean that the purpose of Avida is optimization. An outcome is not a purpose. More importantly, when a run of Avida does not yield a critter of maximum fitness, the run has not failed. There are still a bunch of "living" virtual organisms, and they always exhibit greater complexity than did the organisms in the initial population.

The objective of the Avida research under consideration was to address emergence of complex features in artificial life. ED&M analyze runs of Avida as though they were runs of an optimizer, and then argue that the ALife simulator is not a good optimizer. This is the kind of abysmal stupidity you get from pretend-scholars who are on a "mission from God" to deliver payback.
Posted by: Turncoat on Dec. 12 2009,17:54

Oops. Avida is an ALife program, not simulator.
Posted by: Kristine on Dec. 13 2009,16:36

Quote (Turncoat @ Dec. 12 2009,17:34)
I'm sorry, folks, but I have trouble attending to the minutiae when ED&M are fundamentally wrong in their orientation. From Wikipedia: "Avida is an artificial life software platform to study the evolutionary biology of self-replicating and evolving computer programs (digital organisms)." As I recall, the ability of a virtual organism to reproduce itself depends on how much work it can do, which is determined by its fitness. Different organisms must do different amounts of work to reproduce. Right?

Avida, with the fitness function addressed by ED&M, does not engage in optimization, let alone search. The virtual organisms are operating in a virtual environment offering different payoffs for different behaviors. The  population of virtual critters in fact changes to obtain greater payoffs, but this does not mean that the purpose of Avida is optimization. An outcome is not a purpose. More importantly, when a run of Avida does not yield a critter of maximum fitness, the run has not failed. There are still a bunch of "living" virtual organisms, and they always exhibit greater complexity than did the organisms in the initial population.

The objective of the Avida research under consideration was to address emergence of complex features in artificial life. ED&M analyze runs of Avida as though they were runs of an optimizer, and then argue that the ALife simulator is not a good optimizer. This is the kind of abysmal stupidity you get from pretend-scholars who are on a "mission from God" to deliver payback.
---------------------QUOTE-------------------


IMHO, this is all about the fact that Richard Dawkins talks about "optimals" (clutch size at the group or individual level, sex ratios, etc.) in The Selfish Gene. Naturally they muck up with words, exploiting the public's possible confusion about theory, optimum, etc., in various contexts, in their arrogance in believing that if they can come up with the right phrase, they'll turn the prince Dawkins into a toad in the eyes of the public, and assume their rightful place as guardians of the world's science, and thus of morals (particularly those of women). They have a bee in their bonnets about Dawkins, who used "optimal" in his book (but they don't mention his discussing Wynne-Edwards's and Fisher's use of the word, what it means in terms of different environments or strategies, etc.). This was another "gotcha" aimed specifically at Dawkins IMO which fell short again.
Posted by: Bob O'H on Dec. 16 2009,06:59

Before any Turncoats turn up to show this off...

< Hahhahahaaa! >

Nice one.
Posted by: dvunkannon on Dec. 21 2009,17:37

Copied for your delectation from the GP-List.
 

---------------------QUOTE-------------------
Sorry if everyone has already seen this, but I had missed it and then Daniel Rodriguez highlighted Eureqa.
It might be interesting for you.

Distilling Free-Form Natural Laws from Experimental Data
Michael Schmidt and Hod Lipson
Science 3 April 2009: 81-85.
An algorithm has been developed to search for natural laws of physics in large data sets.
< http://dx.doi.org/doi:10.1126/science.1165893 >
4 Minute Video < http://www.sciencemag.org/content....3s1.mpg >
Bill
Dr. W. B. Langdon,Department of Computer Science,King's College London,Strand, London, WC2R 2LS, UK < http://www.dcs.kcl.ac.uk/staff/W.Langdon/ >
FOGA 2011 < http://www.sigevo.org/foga-2011/CIGPU >
2010 < http://www.cs.ucl.ac.uk/external/W.Langdon/cigpuA >
Field Guide to Genetic Programming < http://www.gp-field-guide.org.uk/RNAnet > < http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet
GP >
EM < http://www.springer.com/10710GP >
Bibliography < http://www.cs.bham.ac.uk/~wbl/biblio/ >  
------------------------------------
---------------------QUOTE-------------------


Posted by: Wesley R. Elsberry on May 18 2010,07:05

< Latest comment linking back here > from Iam's weasel thread on PT.
Posted by: Richardthughes on May 18 2010,10:40

Quote (dvunkannon @ Dec. 21 2009,17:37)
Copied for your delectation from the GP-List.
   

---------------------QUOTE-------------------
Sorry if everyone has already seen this, but I had missed it and then Daniel Rodriguez highlighted Eureqa.
It might be interesting for you.

Distilling Free-Form Natural Laws from Experimental Data
Michael Schmidt and Hod Lipson
Science 3 April 2009: 81-85.
An algorithm has been developed to search for natural laws of physics in large data sets.
< http://dx.doi.org/doi:10.1126/science.1165893 >
4 Minute Video < http://www.sciencemag.org/content....3s1.mpg >
Bill
Dr. W. B. Langdon,Department of Computer Science,King's College London,Strand, London, WC2R 2LS, UK < http://www.dcs.kcl.ac.uk/staff/W.Langdon/ >
FOGA 2011 < http://www.sigevo.org/foga-2011/CIGPU >
2010 < http://www.cs.ucl.ac.uk/external/W.Langdon/cigpuA >
Field Guide to Genetic Programming < http://www.gp-field-guide.org.uk/RNAnet > < http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet
GP >
EM < http://www.springer.com/10710GP >
Bibliography < http://www.cs.bham.ac.uk/~wbl/biblio/ >  
------------------------------------
---------------------QUOTE-------------------


---------------------QUOTE-------------------


Now that's amazing. AI ramifications, surely?
Posted by: dvunkannon on May 18 2010,11:09

Quote (Richardthughes @ May 18 2010,11:40)
Quote (dvunkannon @ Dec. 21 2009,17:37)
Copied for your delectation from the GP-List.
   

---------------------QUOTE-------------------
Sorry if everyone has already seen this, but I had missed it and then Daniel Rodriguez highlighted Eureqa.
It might be interesting for you.

Distilling Free-Form Natural Laws from Experimental Data
Michael Schmidt and Hod Lipson
Science 3 April 2009: 81-85.
An algorithm has been developed to search for natural laws of physics in large data sets.
< http://dx.doi.org/doi:10.1126/science.1165893 >
4 Minute Video < http://www.sciencemag.org/content....3s1.mpg >
Bill
Dr. W. B. Langdon,Department of Computer Science,King's College London,Strand, London, WC2R 2LS, UK < http://www.dcs.kcl.ac.uk/staff/W.Langdon/ >
FOGA 2011 < http://www.sigevo.org/foga-2011/CIGPU >
2010 < http://www.cs.ucl.ac.uk/external/W.Langdon/cigpuA >
Field Guide to Genetic Programming < http://www.gp-field-guide.org.uk/RNAnet > < http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet
GP >
EM < http://www.springer.com/10710GP >
Bibliography < http://www.cs.bham.ac.uk/~wbl/biblio/ >  
------------------------------------
---------------------QUOTE-------------------


---------------------QUOTE-------------------


Now that's amazing. AI ramifications, surely?
---------------------QUOTE-------------------


I would sic it on the large (ahem) datasets that are going to come out of the LHC.

I see that Fermilab's Tevatron has found something interesting.

< http://www.nytimes.com/2010....science >
Posted by: dvunkannon on July 03 2010,13:16

Open access to a cool issue for July only!

   

---------------------QUOTE-------------------
Note that John's article is available for FREE under open access, as is the Langdon and Gustafson article in the same issue.

Even better, the ENTIRE ISSUE is available for FREE during the month of July, 2010. This is because the publisher would like to promote this "Tenth Anniversary Issue: Progress in Genetic Programming and Evolvable Machines," edited by Julian Miller and Riccardo Poli.

I think that this is a really nice special issue, containing some excellent summaries of the state of the field and directions in which it is expected to move in the near future. The link to the online version of the entire issue is:  http://www.springerlink.com/content/h46r77k291rn/?p=bfaf36a87f704d5cbcb66429f9c8a808&pi=0
-Lee  

On Jul 2, 2010, at 12:31 PM, John Koza wrote:
> Hello:
> > > The Genetic Programming and Evolvable Machines journal has just published my article (which is available in its entirety) on “Human-competitive results produced by genetic programming.” It lists 76 human-competitive results produced by genetic programming and various features common to work done in different fields by many different researchers.
> > [URL=http://www.springerlink.com/content/92n753376213655k/
>]http://www.springerlink.com/content/92n753376213655k/>[/URL]
--Lee Spector,
Professor of Computer Science
School of Cognitive Science,
Hampshire College
893 West Street, Amherst, MA

<a href="mailto:01002-3359lspector@hampshire.edu">01002-3359lspector@hampshire.edu
</a>, < http://hampshire.edu/lspector/ >
Phone:  
413-559-5352, Fax: 413-559-5438
Check out Genetic Programming and Evolvable Machines:http://www.springer.com/10710 - [URL=http://gpemjournal.blogspot.com/
---------------------QUOTE-------------------



Sorry for the crappity reformatting there.

Koza's article on human competitive results is useful ammunition against low grade IDC arguments that evolution can't do nothin'.

The discussion of lack of NFL applicability to GP in the article on the next 10 years in GP theory should be useful against the next level up in IDC critics.
Posted by: midwifetoad on July 04 2010,10:24

Did anyone post this:

< http://www.csicop.org/si/show/the_war_of_the_weasels >



---------------------QUOTE-------------------
Or “How an Intelligent Design Theorist was Bested in a Public Math Competition by a Genetic Algorithm!”

This Online Extra is a follow-up to the article “War of the Weasels” from the May/June 2010 issue of the Skeptical Inquirer (Volume 34.3, May/June 2010). The print article discusses the use of a genetic algorithm (GA) to solve tricky math problems and demonstrates that no specific “target” is required for such algorithms, contra the interminable creationist attacks on the “Weasel” simulation discussed in Richard Dawkins's book The Blind Watchmaker. The problem I developed the GA for is called Steiner's Problem; it involves finding the shortest straight-line-segment networks connecting an array of given fixed points. This problem provides a miniature digital playground on the very edge of complexity.



---------------------QUOTE-------------------



A follow-up to this:

< http://pandasthumb.org/archives/2010/04/war-of-the-weas.html >
Posted by: dvunkannon on Jan. 21 2011,16:48

A half year's worth of birfday threads have pushed this worthy thread down and out of sight.

Rather than just bump it, I will take this chance to share with you a design for a GA experiment. Comments please.

I've been thinking recently about the arguments of Polanyi, as filtered by Meyer, and trickling down to the level of Upright Biped and David Abel. Can we show the evolution of a genetic code from nothing (pure noise) to some level of function? I think yes, we can.

Each member of my GA population will have a different genetic code to start. There are 64 codons, and 22 amino acids, so the framework of each genome is a 64*22 array. The content of each array slot will be 10 bits that can be read as an affinity of this codon for this amino acid. (So the total genome size is 64*22*10 bits.)

Let's say we are looking at the row for AUU. It contains 22 10-bit integers. We can look at these as weights that affect the likelihood this codon will code for a particular amino acid. As an illustration, perhaps the value for AA3 (amino acid 3) is 200 and AA12 is 400, and the rest of the row is 0. In this case, the codon AUU will produce amino acid 3 about 1/3 of the time, and amino acid 12 about 2/3 of the time.

If we look at the modern genetic code as this kind of array, each of the 64 rows is full of 0 bits in 21 out of 22 slots. In that 22nd slot, the bits are all 1s. Actually, the code is not quite that strong, and sometimes a codon will produce a different amino acid (leucine or valine instead of isoleucine, for example). So from a randomly filled array, we want to see if an array will come to dominate the population that resembles a mordern array. Importantly, we don't care which codon eventually comes to code for which amino acid.

The fitness function will test 640 values created by taking each codon 10 times, and choosing an amino acid based on the weights in the row in the table. This results in 640 codon-AA pairs. Now we score the fitness of the individual based on these pairs.

The first criteria is coverage, does the table produce all 22 amino acids? Score one for each unique AA in the 640 outputs. Maximum score on this criteria = 22.

The second criteria is reliability, does the table produce the same AA each time? Calculate the score by first looking at each group of 10 trials for a single codon. For the AA produced most often, how many trials produced that AA? The answer will be 10% to 100%. Average these percentages across all 64 codons. Maximum score is 100% reliability.

Third criteria is efficiency, how often does each AA get produced? Do six codons produce one AA, while another AA is produced by only one codon? We could create an order to the list of AAs and say the top AA is needed three times more often than the lowest. Or we could simply measure if they are all produced at about the same rate.

The fourth criteria is resilience, if there is a mutation in a codon, will the same AA still be produced?

Fifth criterion is like a second order resilience, if a mutation creates a different AA, is that AA still in the same polar/nonpolar, hydrophobic/hydrophillic class?

Take a weighted average of each criteria. I would weight the criteria so that the first one is most important.

Since the genome is about 40Kbits, I would use a large population, at least 1,000 individuals, and be prepared for some long run times before seeing convergence on a solution. However, I see no reason to expect that a single code will not eventually emerge the winner. The open question is the structure of that winner.

Yes, this a complex design, maybe overdesigned, but I think it will work. It will aslo support changing the weights of the different criteria and seeing the winning table has any significantly different structure as a result.
Posted by: Wesley R. Elsberry on Jan. 21 2011,21:55

The canonical genetic code can be represented as a degenerate code with 64 input values and 22 output values. Only 20 of the output values, though, specify an amino acid. The other two output values are for "start" and "stop".

I remember back in 1992 when some of my colleagues at Battelle PNL presented on their database for human genome results. They noted that they were scaling to be able to handle the large volume of data, and how the human genome comprised some 3 billion bases on 24 chromosomes.

24? I thought that was odd, since humans have 23 chromosomes. But they treated the sex chromosomes as two separate chromosomes, which makes sense from a programming standpoint.
Posted by: fnxtr on Jan. 21 2011,22:39

Quote (Wesley R. Elsberry @ Jan. 21 2011,19:55)
The canonical genetic code can be represented as a degenerate code with 64 input values and 22 output values. Only 20 of the output values, though, specify an amino acid. The other two output values are for "start" and "stop".

I remember back in 1992 when some of my colleagues at Battelle PNL presented on their database for human genome results. They noted that they were scaling to be able to handle the large volume of data, and how the human genome comprised some 3 billion bases on 24 chromosomes.

24? I thought that was odd, since humans have 23 chromosomes. But they treated the sex chromosomes as two separate chromosomes, which makes sense from a programming standpoint.
---------------------QUOTE-------------------


I was about to say "You could somehow model the sex chromosomes as an XOR function", then realized how naive that was. :-)
Posted by: sparc on Jan. 21 2011,23:50

For real world numbers to start with you may want to have a look in < Goodenbour JM and Tao Pan T (2006): Diversity of tRNA genes in eukaryotes. Nucl. Acids Res. 34 (21): 6137-6146 >.
 

---------------------QUOTE-------------------
Only 20 of the output values, though, specify an amino acid. The other two output values are for "start" and "stop"
---------------------QUOTE-------------------

The initiation codon actually results in two outputs ("start" and Met in the majority of cases where AUG is used). However, the "start" output results from positional information of the initiation codon within the mRNA. Downstream AUGs usually don't function as start sites for translation but only encode Met (BTW, the triplet of the tRNA that pairs with the codon of the mRNA is called the anticodon). The stop opal stop codon (UGA) can also have a diffenrent meaning: A specific tRNASec will incorporate Selenocystein instead of terminating translation. However, to do so additional sequnece information is required: in bacteria selenocystein insertion sequneces (SECIS) are usually within the coding sequence directly downstram of the Sec codon. In eucaryotes and in some bacterial genes SECIS elements are located in the 3'-UTR of the mRNA encoding a seleno-protein.
Posted by: Wesley R. Elsberry on Jan. 22 2011,04:00

Sparc, thanks for correcting my sloppiness.

However, the canonical genetic code still only specifies 20 amino acids*, which changes some of the design of David's EC experiment. The case with some organisms managing to get additional amino acids specified is a derivation and needn't be included in the basic experimental design, IMO.

* Which is, if I'm reading it right, also confirmed in the link you provided:



---------------------QUOTE-------------------

In vivo, each tRNA also carries negative determinants to prevent mis-charging by the 19 other tRNA synthetases (31,32).

---------------------QUOTE-------------------


Posted by: dvunkannon on Jan. 22 2011,07:32

Great discussion, and I appreciate the feedback.

My thought on using 22 instead of 20:

It makes the base task harder. I don't think it would be so much harder as to be insurmountable, and I'd prefer to solve the slightly harder problem to avoid carping by IDCists.

I have a vague notion that start and stop are later additions to the functionality of the code. Early translation machinery could have taken linear strings of RNA, started at the beginning and run until there was no more input. The idea of stringing the code for multiple proteins together on one strand of RNA, and then needing to distinguish when one stopped and another began, is a later innovation. This packaging of genes into chromosomoes greatly increases the likelihood that the complete network of proteins can get created in each daughter cell, but for my purposes is not relevant. I think that developing a code happens during the protobiont stage of abiogenesis, when molecular evolution is still the driver, not Darwinian evolution.
Posted by: sparc on Jan. 23 2011,01:40

Quote (Wesley R. Elsberry @ Jan. 22 2011,04:00)
Sparc, thanks for correcting my sloppiness.

However, the canonical genetic code still only specifies 20 amino acids*, which changes some of the design of David's EC experiment. The case with some organisms managing to get additional amino acids specified is a derivation and needn't be included in the basic experimental design, IMO.

* Which is, if I'm reading it right, also confirmed in the link you provided:

 

---------------------QUOTE-------------------

In vivo, each tRNA also carries negative determinants to prevent mis-charging by the 19 other tRNA synthetases (31,32).

---------------------QUOTE-------------------


---------------------QUOTE-------------------


I would never call you sloppy because
1. I am not sure if I would know much about selenocysteine if a former colleague had not suggested to conditionally remove the SECIS to inactivate a selenoprotein in mice.
2. your contributions here, at the Austringer and elsewhere prove the opposite.
I also agree with dvunkannon that it will be more than enough if he sticks to the canonical code.

Still, allow me a last comment: It is 20 AA because there is no free selenocysteine in the cells. The tRNSSec binds free serine which is then transformed to selenocysteine and this is more or less the point where my knowledge about selenocysteine ends.
Posted by: dvunkannon on Jan. 23 2011,07:34

Quote (Wesley R. Elsberry @ Jan. 22 2011,05:00)
* Which is, if I'm reading it right, also confirmed in the link you provided:



---------------------QUOTE-------------------

In vivo, each tRNA also carries negative determinants to prevent mis-charging by the 19 other tRNA synthetases (31,32).

---------------------QUOTE-------------------


---------------------QUOTE-------------------


So what I am proposing to model with 10 exponentially scaled "affinity bits" are in real life negative determinants, dis-affinities. Interesting, but I don't think it will change the design.
Posted by: dvunkannon on Jan. 31 2011,11:59

If you read Pharyngula, or are just cooler than I am, you might have seen this already:

< http://boxcar2d.com/ >

It is a simple GA that builds wheeled things to run along bumpy tracks. It works as expected (ie the vehicles grow better able to traverse the track over time) even though the parameters are not very helpful.

What is nice about this for sharing with non- or anti-scientific friends is that it is obvious that the vehicles are adapting to the physics of their 2D bumpy universe, not to some fixed target "smuggled in" as part of the fitness function.

The UI is cryptic and not particularly helpful. You watch each individual going over the course, which is an easy way to waste copious time. There is a graph of red and black lines that I'm guessing are best in generation and generation average fitness, but it is unlabled, and gets clipped off as performance improves.

Reloading the page gets you a new course and resets the simulation.

Fun, addicting, and informative.
Posted by: midwifetoad on Jan. 31 2011,12:50

Doesn't really say much about what is mutating.
Posted by: dvunkannon on Jan. 31 2011,14:21

Quote (midwifetoad @ Jan. 31 2011,13:50)
Doesn't really say much about what is mutating.
---------------------QUOTE-------------------


Agree, I think it is just one real number being replaced by another, lots of bits changing at once. One of the reasons real-numbered alleles might be useful for optimization, but don't carry a lot of biological motivation.

My current run is fun because the dominant morph has got a small wheel perched on top that makes it look like a rider on a bicycle, or TRON-esque lightcycle. It is functional, since the bumps make the morph flip over a lot.
Posted by: dheddle on Mar. 01 2011,17:18

Are ant colony algorithms examples of genetic algorithms?
Posted by: Wesley R. Elsberry on Mar. 01 2011,21:07

If something has heritable properties that are passed from parent to child instance, that is likely to be considered included broadly within evolutionary computation. "Genetic algorithm" is a specific form of evolutionary computation due to John Holland. GAs operate on instances that are fixed-length bit strings. Other evolutionary computation differs from that. For instance, the Avida program fits in "genetic programming" specifically and "artificial life" categories in evolutionary computation. It operates on variable-length, fixed cardinality symbol sets (fixed cardinality for any given run; that can be changed by instantiating an instruction set of different size).

I suspect ant colony simulations are not, strictly speaking, in evolutionary computation. I would think that they would fit in both natural computation and emergent computation categories.
Posted by: sledgehammer on Mar. 01 2011,22:59

Quote (dheddle @ Mar. 01 2011,15:18)
Are ant colony algorithms examples of genetic algorithms?
---------------------QUOTE-------------------


I would think ant behavior is best described by cellular automata.
Posted by: dheddle on Mar. 02 2011,06:29

Quote (Wesley R. Elsberry @ Mar. 01 2011,21:07)
If something has heritable properties that are passed from parent to child instance, that is likely to be considered included broadly within evolutionary computation. "Genetic algorithm" is a specific form of evolutionary computation due to John Holland. GAs operate on instances that are fixed-length bit strings. Other evolutionary computation differs from that. For instance, the Avida program fits in "genetic programming" specifically and "artificial life" categories in evolutionary computation. It operates on variable-length, fixed cardinality symbol sets (fixed cardinality for any given run; that can be changed by instantiating an instruction set of different size).

I suspect ant colony simulations are not, strictly speaking, in evolutionary computation. I would think that they would fit in both natural computation and emergent computation categories.
---------------------QUOTE-------------------


Hmm.  I never made a distinction between evolutionary algorithms and GAs. Are you saying the latter is a subset of the former where the solution is necessarily represented by a bit string? None of the implementation I wrote used bits--they all used real numbers (and/or ints) to represent solutions.

The question I asked was along a "theory of algorithms" direction. That's a topic in which I have little expertise. But I recall reading (IIRC) that simulated annealing can be shown to be equivalent to a GA--and having done both (though not with bit representations) that "smelled" right.

For fun I thought I'd try a traveling salesman by an ant colony algorithm, and it struck me that it will just be a GA  where the fitness function is maximal pheromone detection.
Posted by: BillB on Mar. 02 2011,07:04

Quote (dheddle @ Mar. 02 2011,12:29)
Quote (Wesley R. Elsberry @ Mar. 01 2011,21:07)
If something has heritable properties that are passed from parent to child instance, that is likely to be considered included broadly within evolutionary computation. "Genetic algorithm" is a specific form of evolutionary computation due to John Holland. GAs operate on instances that are fixed-length bit strings. Other evolutionary computation differs from that. For instance, the Avida program fits in "genetic programming" specifically and "artificial life" categories in evolutionary computation. It operates on variable-length, fixed cardinality symbol sets (fixed cardinality for any given run; that can be changed by instantiating an instruction set of different size).

I suspect ant colony simulations are not, strictly speaking, in evolutionary computation. I would think that they would fit in both natural computation and emergent computation categories.
---------------------QUOTE-------------------


Hmm.  I never made a distinction between evolutionary algorithms and GAs. Are you saying the latter is a subset of the former where the solution is necessarily represented by a bit string? None of the implementation I wrote used bits--they all used real numbers (and/or ints) to represent solutions.

The question I asked was along a "theory of algorithms" direction. That's a topic in which I have little expertise. But I recall reading (IIRC) that simulated annealing can be shown to be equivalent to a GA--and having done both (though not with bit representations) that "smelled" right.

For fun I thought I'd try a traveling salesman by an ant colony algorithm, and it struck me that it will just be a GA  where the fitness function is maximal pheromone detection.
---------------------QUOTE-------------------


I've not come across that categorical distinction before, from what I was taught through my MSc onwards a GA can be implemented with variable length genomes, and with a variety of encoding formats.

The bulk of the work I was exposed to was in using GA's with animat simulations (using dynamic recurrent NNEts), evolutionary robotics and evolutionary electronics (Husbands, Harvey, Thompson et.al. at Sussex Uni) They never regarded or referred to a GA as requiring a bit encoded fixed length genotype. I guess historically the GA was created in this form but I think the classification has become a little broader (at least in some disciplines like Evo robotics)
Posted by: Wesley R. Elsberry on Mar. 02 2011,08:25

Quote (dheddle @ Mar. 02 2011,06:29)
Quote (Wesley R. Elsberry @ Mar. 01 2011,21:07)
If something has heritable properties that are passed from parent to child instance, that is likely to be considered included broadly within evolutionary computation. "Genetic algorithm" is a specific form of evolutionary computation due to John Holland. GAs operate on instances that are fixed-length bit strings. Other evolutionary computation differs from that. For instance, the Avida program fits in "genetic programming" specifically and "artificial life" categories in evolutionary computation. It operates on variable-length, fixed cardinality symbol sets (fixed cardinality for any given run; that can be changed by instantiating an instruction set of different size).

I suspect ant colony simulations are not, strictly speaking, in evolutionary computation. I would think that they would fit in both natural computation and emergent computation categories.
---------------------QUOTE-------------------


Hmm.  I never made a distinction between evolutionary algorithms and GAs. Are you saying the latter is a subset of the former where the solution is necessarily represented by a bit string? None of the implementation I wrote used bits--they all used real numbers (and/or ints) to represent solutions.

The question I asked was along a "theory of algorithms" direction. That's a topic in which I have little expertise. But I recall reading (IIRC) that simulated annealing can be shown to be equivalent to a GA--and having done both (though not with bit representations) that "smelled" right.

For fun I thought I'd try a traveling salesman by an ant colony algorithm, and it struck me that it will just be a GA  where the fitness function is maximal pheromone detection.
---------------------QUOTE-------------------


Yes, GAs are a very specific subset of evolutionary computation, at least in the origination of the term and how it is used by people who distinguish between genetic algorithms, evolutionary strategies, genetic programming, etc.

Computational theory does show (I've seen this in a paper by Harold Szu) that many of the models of artificial neural networks are Turing complete, that any computable function can be accomplished using them. I suspect that such is available for simulated annealing and GAs as well. That doesn't mean that the mechanisms by which each work are in any way similar.

I haven't done ant colony simulation, so I don't know what's under the covers, but if it doesn't involve inheritance, it isn't evolutionary computation.
Posted by: dvunkannon on Mar. 02 2011,10:11

I would agree with BillB and others, above, that GA is still GA even when other encodings and representations of the genome are used. For example, Franz Rothlauf has a whole dissertation on the issue.
< http://www.amazon.com/Represe....2064108 >

David Goldberg of UIUC championed the binary representation early (over high cardinality alphabets), but he certainly understood GA to include higher cardinality alphabets. You could make an argument that the defining characteristic of GA is the linear nature of the chromosome (allowing rings). Even that is stressed by GP representations of what is usually a tree as a variable length linear string. Then you have to shift to a definition of GA that stresses the idea that the genome represents parameters to a program (the fitness function) not the function itself. In GP, when you eval the population member, you are saying in effect that it is the fitness function. In the end, you are better off admitting that these are fuzzy, historical, or personal definitions.

GA for traveling salesman problems uses alleles where each allele is a token for a city. What matters is the order of the tokens, not the token values.

I agree with Heddle that ACO is very like GA for TSP. Both are finding tours through graphs. A connection with ES is that ACO algorithms don't cross tour solutions with each other, they vary existing solutions in the populations. This is similar to the asexual variation and reward we are all familiar with in WEASEL. A connection with EDA is that the population is not represented explicitly, the population is the set of all possible complete paths, and population members are drawn based on pheronome strengths of the edges in the path.

If all the graph edge costs are known in advance (there is no sense of discovery during the run) then ACO can be converted to TSP by just adding one edge of zero cost between the food and the nest. Then use your favorite TSP solver, GA, whatever.
Posted by: Wesley R. Elsberry on July 30 2011,19:29

Googling some terms brought up an < antievolutionary rant > from 2006 claiming that Dawkins used locking in "weasel". I entered a comment there thttp://ravingconservative.blogspot.com/2006/05/dismembering-evolution-8-avida.htmlhat I will copy here:



---------------------QUOTE-------------------

Daniel: "He programmed it to lock in the letters when the right one fell into place."

I just ran across this blog post. There is no discussion of locking characters in Dawkins' "The Blind Watchmaker". But we don't need to quibble about inaccessible programs from 1986. One can easily code the "weasel" program without any "locking" and still have it perform just as well as what Dawkins described. In fact, the performance difference between "locking" non-weasel programs and proper weasel programs is minor at reasonable values of population size and mutation rate.

See < http://tinyurl.com/atbc-ec-thread > for more information.

Wesley R. Elsberry

---------------------QUOTE-------------------


Posted by: Dr.GH on July 31 2011,01:17

Quote (Wesley R. Elsberry @ July 30 2011,17:29)
Googling some terms brought up an < antievolutionary rant >
---------------------QUOTE-------------------


Looking for more stupid, I found that "ravingconservative" had shifted to YouTube. And, the next link resulted in the message that "ravingconservative" had gotten the boot from YouTube for "multiple violations of community standards."
Posted by: Richardthughes on Aug. 02 2011,14:17

More Weasel:

< http://www.cs.laurentian.ca/badams/evolution/EvolutionApplet102.html >

You can see 'non-latching' if you look closely, Gorden E Mullings of Montserrat.
Posted by: dvunkannon on Dec. 01 2011,10:39

< http://www.lulu.com/product....8702814 >

Genetic Programming applied to playing games of various sorts, including checkers for the frilly shirted among us.

The author's work has won 5 Humie awards, these being given out for evolving something that matches or surpasses human competence in some area.
Posted by: midwifetoad on Feb. 02 2012,14:07

Dumb question:

I've just encountered the term simulated annealing and a discussion of annealing algorithms vs genetic algorithms.

< http://www.cs.helsinki.fi/u....sa....sa.html >

The difference seems to be that an annealing algorithm selects only one child per generation to produce the next generation.

That would make Weasel and it's relatives an annealing algorithm rather than a genetic algorithm?

This came up in a discussion of Chaitin and his mathematical proof of evolution.
Posted by: dvunkannon on Feb. 05 2012,21:04

Quote (midwifetoad @ Feb. 02 2012,15:07)
Dumb question:

I've just encountered the term simulated annealing and a discussion of annealing algorithms vs genetic algorithms.

< http://www.cs.helsinki.fi/u....sa....sa.html >

The difference seems to be that an annealing algorithm selects only one child per generation to produce the next generation.

That would make Weasel and it's relatives an annealing algorithm rather than a genetic algorithm?

This came up in a discussion of Chaitin and his mathematical proof of evolution.
---------------------QUOTE-------------------


There are a large number of Evolutionary Computation (EC) algorithms that use only one candidate solution. Evolution Strategies (ES) is an example. Weasel, IMHO, fits into the ES model the best.

I don't think the reference you provided does a good job of explaining simulated annealing. The main point of SA is to avoid a local optimum by randomly perturbing away from a good solution that isn't getting any better over several iterations, in the hope of finding a better one. The analogy is to cooling down an object (finding the optimization minimum) then partially reheating and cooling down again. So it is mostly hill climbing (or valley following, depending if you are minimizing or maximizing) with a kick every once in a while to avoid the local optimum.

The population size difference is not the most important difference between SA and GA to my way of looking at things. Even if SA had multiple candidate solutions, hill climbing with an occasional kick isn't the same as a mutation only GA. The SA algorithm is using non-local information (lack of solution improvement over some period of time), most GA algorithms avoid non-local information. SA algorithms usually operate on the real valued parameters directly (the phenotype) while GA algorithms usually apply mutation to the genotype (a binary string) which gets translated into the phenotype. The result of this genotype-phenotype distinction is that a mutation that is local in genotype space can move you arbitrarily far in phenotype space.

Here's an example of the difference. The problem has two real valued parameters, X and Y. If the current best candidate solution is { 0.5, 0.5 }, an SA algorithm might test
{ 0.51, 0.51 }
{ 0.51, 0.49 }
{ 0.49, 0.51 }
{ 0.49, 0.49 }
and if none of them is better, kick you off to  { 0.37, 0.19 }.

In contrast, the GA algorithm is working on a 20 bit string. Let's say the population size was 4, and they all were
10000000001000000000, which gets translated into a phenotype of { 0.5, 0.5 } just like the SA solution. If the mutation rate is 0.05, we would expect 1 bit to change in each 20 bit individual. But a 1 bit change to
00000000001000000000
has moved the phenotype to { 0.0, 0.5 }!

That's not hill climbing!  A lot of people, including our friend Kairosfocus at UD, think mutation = hill climbing. It does not. OK, it does, but only for the simplest of genotype-phenotype mappings. Real world mapping of DNA to protein, where one nucleotide change could flip you from a hydrophobic amino acid to a hydrophillic amino acid, with consequent changes to the 3D folded structure of the protein, isn't simple.

Bottom line - SA and GA are pretty different EC algorithms, Weasel is ES, not SA, and mutation isn't hill climbing. Test at the end of next class.
Posted by: midwifetoad on Feb. 06 2012,14:10

In my word evolver I simply kill off the highest scoring candidate, "randomly," 25 percent of the time. It seems to prevent getting stuck.
Posted by: dvunkannon on Feb. 06 2012,16:33

Quote (midwifetoad @ Feb. 06 2012,15:10)
In my word evolver I simply kill off the highest scoring candidate, "randomly," 25 percent of the time. It seems to prevent getting stuck.
---------------------QUOTE-------------------


How did this come up in a discussion of Chaitin and evolution? I'd be interested to hear what the arguments were.
Posted by: midwifetoad on Feb. 07 2012,08:01

Chaitin is the new darling at UD. Not sure why.
Posted by: DiEb on Mar. 13 2012,16:19

I just took another go on Dembski's and Marks's Horizontal No Free Lunch Theorem, as KairosFocus referred to it at < UncommonDescent >:

On a Wrong Remark in a Paper of Robert J. Marks II and William A Dembski

 

---------------------QUOTE-------------------
Abstract: In their 2010 paper The Search for a Search - Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.
---------------------QUOTE-------------------


The whole essay can be found < here >.
Posted by: Henry J on Mar. 13 2012,22:55

Well of course there's no free lunch. Even a slice of pizza costs something. ;)

Henry
Posted by: DiEb on May 24 2012,03:13

Quote (DiEb @ Mar. 13 2012,22:19)
I just took another go on Dembski's and Marks's Horizontal No Free Lunch Theorem, as KairosFocus referred to it at < UncommonDescent >:

On a Wrong Remark in a Paper of Robert J. Marks II and William A Dembski

   

---------------------QUOTE-------------------
Abstract: In their 2010 paper The Search for a Search - Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.
---------------------QUOTE-------------------


The whole essay can be found < here >.
---------------------QUOTE-------------------


I was just informed by Winston Ewert that there is a new erratum at the paper A Search for a Search which should address (some of) my points. < Here is my first reaction >. And does the Journal of Advanced Computational Intelligence and Intelligent Informatics know?
Posted by: The whole truth on May 24 2012,04:35

Quote (DiEb @ May 24 2012,01:13)
Quote (DiEb @ Mar. 13 2012,22:19)
I just took another go on Dembski's and Marks's Horizontal No Free Lunch Theorem, as KairosFocus referred to it at < UncommonDescent >:

On a Wrong Remark in a Paper of Robert J. Marks II and William A Dembski

     

---------------------QUOTE-------------------
Abstract: In their 2010 paper The Search for a Search - Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.
---------------------QUOTE-------------------


The whole essay can be found < here >.
---------------------QUOTE-------------------


I was just informed by Winston Ewert that there is a new erratum at the paper A Search for a Search which should address (some of) my points. < Here is my first reaction >. And does the Journal of Advanced Computational Intelligence and Intelligent Informatics know?
---------------------QUOTE-------------------


Regarding ear stoppers, it wouldn't surprise me if one of these days that IDiots are found to have evolved a flap inside their ears that automatically and quickly closes at the first sign of any sort of reality trying to get in. Of course if such a flap were found the IDiots would claim that it's the result of intelligent design by their designer/creator, who did it so that they won't be plagued with hearing realistic challenges to their unsupported beliefs and assertions.  :)
Posted by: DiEb on May 24 2012,08:12

Quote (The whole truth @ May 24 2012,10:35)
Quote (DiEb @ May 24 2012,01:13)
 
Quote (DiEb @ Mar. 13 2012,22:19)
I just took another go on Dembski's and Marks's Horizontal No Free Lunch Theorem, as KairosFocus referred to it at < UncommonDescent >:

On a Wrong Remark in a Paper of Robert J. Marks II and William A Dembski

       

---------------------QUOTE-------------------
Abstract: In their 2010 paper The Search for a Search - Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.
---------------------QUOTE-------------------


The whole essay can be found < here >.
---------------------QUOTE-------------------


I was just informed by Winston Ewert that there is a new erratum at the paper A Search for a Search which should address (some of) my points. < Here is my first reaction >. And does the Journal of Advanced Computational Intelligence and Intelligent Informatics know?
---------------------QUOTE-------------------


Regarding ear stoppers, it wouldn't surprise me if one of these days that IDiots are found to have evolved a flap inside their ears that automatically and quickly closes at the first sign of any sort of reality trying to get in. Of course if such a flap were found the IDiots would claim that it's the result of intelligent design by their designer/creator, who did it so that they won't be plagued with hearing realistic challenges to their unsupported beliefs and assertions.  :)
---------------------QUOTE-------------------


I exchanged emails on this subject with Bob Marks back in 2010! Even before the paper was published in the first place, I had pointed out this problem - in private and in public. In Sep 2010, Bob Marks informed me that has a policy not to engage in correspondence with anyone publically critical of him or his work, as independent of the validity or invalidity of the details of the exchange, these things are best discussed thoroughly before any public pronouncements. So he willfully  chose to ignore every unpleasant critic, on his own peril.
Posted by: The whole truth on May 25 2012,01:10

Quote (DiEb @ May 24 2012,06:12)
Quote (The whole truth @ May 24 2012,10:35)
 
Quote (DiEb @ May 24 2012,01:13)
   
Quote (DiEb @ Mar. 13 2012,22:19)
I just took another go on Dembski's and Marks's Horizontal No Free Lunch Theorem, as KairosFocus referred to it at < UncommonDescent >:

On a Wrong Remark in a Paper of Robert J. Marks II and William A Dembski

         

---------------------QUOTE-------------------
Abstract: In their 2010 paper The Search for a Search - Measuring the Information Cost of Higher Level Search, the authors William A. Dembski and Robert J. Marks II present as one of two results their so-called Horizontal No Free Lunch Theorem. One of the consequences of this theorem is their remark: If no information about a search exists, so that the underlying measure is uniform, then, on average, any other assumed measure will result in negative active information, thereby rendering the search performance worse than random search. This is quite surprising, as one would expect in the tradition of the No Free Lunch theorem that the performances are equally good (or bad). Using only very basic elements of probability theory, this essay shows that their remark is wrong - as is their theorem.
---------------------QUOTE-------------------


The whole essay can be found < here >.
---------------------QUOTE-------------------


I was just informed by Winston Ewert that there is a new erratum at the paper A Search for a Search which should address (some of) my points. < Here is my first reaction >. And does the Journal of Advanced Computational Intelligence and Intelligent Informatics know?
---------------------QUOTE-------------------


Regarding ear stoppers, it wouldn't surprise me if one of these days that IDiots are found to have evolved a flap inside their ears that automatically and quickly closes at the first sign of any sort of reality trying to get in. Of course if such a flap were found the IDiots would claim that it's the result of intelligent design by their designer/creator, who did it so that they won't be plagued with hearing realistic challenges to their unsupported beliefs and assertions.  :)
---------------------QUOTE-------------------


I exchanged emails on this subject with Bob Marks back in 2010! Even before the paper was published in the first place, I had pointed out this problem - in private and in public. In Sep 2010, Bob Marks informed me that has a policy not to engage in correspondence with anyone publically critical of him or his work, as independent of the validity or invalidity of the details of the exchange, these things are best discussed thoroughly before any public pronouncements. So he willfully  chose to ignore every unpleasant critic, on his own peril.
---------------------QUOTE-------------------


Ignoring critics, whether in public or in private, is a skill that IDiots have thoroughly mastered.

I'm sure that Marks and the other IDiots never even consider that their assertions are or could be perilous, because to them being wrong just doesn't compute. They want to dictate and preach, not listen, discuss, learn, or be corrected.

From what I've seen Marks seems to be one of the most isolated IDiots (and willingly so).
Posted by: Wesley R. Elsberry on Jan. 16 2014,04:04

While I was looking up use of PyQtGraph in PySide, I ran across an example program in an unfamiliar language. A little probing led me to < the website for Julia >, a new (2013 FOSS release) programming language for scientific and technical computing. The language is dynamic, but uses just-in-time (JIT) compiler technology to achieve benchmark tests within 2x the time C code takes. There's an IDE, < Julia Studio >, that I am finding useful. Emacs and the command line work just fine as well.

As an initial exercise, I ported my "minimal weasel" program from Python to Julia. The result is a 46-line program. This is slightly longer than the Python version because I'm using a line each for the Julia convention of closing a block with "end". Python uses indentation to indicate block closure.



---------------------CODE SAMPLE-------------------

# Minimum Weasel in Julia  -- Wesley R. Elsberry
t = "METHINKS IT IS LIKE A WEASEL"    # Target phrase
b = " ABCDEFGHIJKLMNOPQRSTUVWXYZ"     # Base pool
n = 178                          # Population size
u = (1.0 / length(t))              # Mutation rate
@printf("Popsize=%d, Mutrate=%f, Bases=%s, Target=%s\n", n,u,b,t)
p = ""                        # Parent string
for ii in [1:length(t)]            # Compose random parent string
   p = p * string(b[rand(1:length(b))])
end
@printf("                        Parent=%s\n",p)
done = false                    # Loop control variable
g = 0                          # Generation counter
bmcnt = 0                        # Base match count (max. in pop.)
bc = ""                        # Best candidate variable
while (done == false)
     pop = ASCIIString[]            # Population of strings
     bmcnt = 0                  # Start with no bases matched
     bcindex = 1                  # Point to first candidate in pop. by default
     for ii in [1:n]              # For size of population
         push!(pop,"")            # Add an empty candidate
         mcnt = 0                # Initialize candidate base match count to zero
         for jj in [1:length(t)]     # Compose a new candidate possibly mutated from the parent
             if u >= rand()        # We have a mutated base
                pop[ii] = pop[ii][1:jj-1] * string(b[rand(1:length(b))])
             else                # No mutation, just copy this base
                pop[ii] = pop[ii][1:jj-1] * string(p[jj])
             end
             if pop[ii][jj] == t[jj] # Candidate matches target at this base
                mcnt += 1          # Increment candidate base match count
             end
             if mcnt > bmcnt        # Candidate is better than current best match
                bmcnt = mcnt        # Change best match count
                bcindex = ii        # Store index of best candidate
             end
             if mcnt >= (length(t) - 0)    # Do enough bases match the target?
                done = true        # Yes, so set loop control for exit
             end
         end
     end
     bc = pop[bcindex]            # Set best candidate as candidate at index
     g += 1                    # Increment generation count
     @printf("Gen=%05d, %02d/%d matched, Best=%s, Total=%06d\n", g, bmcnt, length(t), bc, g*n)
     p = bc                    # Parent for next generation is the best candidate from this one
end
println("weasel done.")

---------------------CODE SAMPLE-------------------



There are a few noteworthy differences from Python. First, Julia is a base-1 language. Arrays start with index 1. Second, Julia's dynamic variable creation requires that a type be provided for declaration of empty arrays. Julia can figure out the type itself for an array that is assigned at least one element, but the programmer has to give a type in order to start with no elements at all. The type system in Julia is apparently one big reason why the developers are able to obtain the good benchmark results. Third, while I have not taken advantage of it here, Julia's syntax allows for expressions to be closer to mathematical notation. "1 + 2x" in Julia is legal, where in Python one would have to have, "1 + 2*x".

I'm planning on porting a more complex program to Julia to see how it performs compared to PyPy.
Posted by: Quack on Jan. 16 2014,07:14

Looks like something I'd have enjoyed using sometime in my past. Alas.
Posted by: Wesley R. Elsberry on Jan. 22 2014,09:05

My MacBook Pro suffered a mainboard failure some time ago, and I just bit the bullet to get it fixed. That was the development machine for my Avida work. I thought that too much bit rot might have rendered things uncompilable on Linux anymore, but I'm happy to say my initial pessimism was unfounded, and I now have a running Linux executable.

I should note just what I'm talking about here. Back in 2009, I presented a paper with my colleagues at Michigan State University titled, < Cockroaches, drunkards, and climbers: Modeling the evolution of simple movement strategies using digital organisms >. I'm working toward taking the next steps in this research. To do that, I needed to get my tools up to date. Where I have the computing power now is in an 8-core desktop running Ubuntu Linux, so I was aiming to get my code compiled there. That's happened just this week. I've been running tests to make sure I'm getting results as I did back in 2009. And the newly-compiled system is, I'm pleased to say, showing the evolution of gradient-ascent effective methods just as the runs in 2009 did.

As a recap, I'm using a version of Avida that I extended to permit movement of organisms in the Avida world grid. Avida's normal mode of operation puts an organism in a specific cell in the world grid, and there it stays for its entire life. I wrote Avida CPU instructions for "tumble", "move", and "sense-diff-facing". The "tumble" instruction simply rotates the organism to a new random facing in its grid cell. A facing is always toward another adjacent grid cell, so for an interior grid cell there are eight legal facings, five legal facings on an edge, and three at each corner grid cell. The "move" instruction puts the organism executing it into the cell that it currently faces. (If there is another organism in that cell, they swap cells.) The "sense-diff-facing" instruction puts the difference in the amount of a specified resource between the current grid cell and the faced grid cell into one of the Avida registers. The run is seeded with the default classic Avida organism. This is an organism whose only functionality is to make a copy of itself. None of the codes associated with movement is included in the initial organism. Mutation is the only way those instructions can enter the genome at the outset.

The environment is defined with a positively rewarding resource, with a peak in the resource set off-center in the world grid. This was done so that the peak resource would not be on an edge or diagonal of the world grid.

The run also includes a 2% cost to the organism for each move instruction that it executes.

The updates are set to permit about three Avida instructions to be executed per-organism per-update. The runs go on for two million updates. The total population is capped at 200 Avidians, so the world grid has about 2% of its grid cells filled with Avidians at any time.

During the run, each grid cell has a count of the number of visits it receives from Avidians. I output these visit counts every 5000 updates. I then plot a surface map of the difference in visits between each update and the one prior, which shows in aggregate the movement of the population for 5000 updates. It becomes very clear when a gradient-ascent effective method, or "climber", becomes the dominant class of organism in the population. I have a few plots to show the transition from "drunkard" dominant to "climber" dominant, and from "climber" to a more efficient "climber".













The results show the evolution of a useful algorithm from scratch. Part of what I did in the work at MSU was in collaboration with Jacob Walker to use the evolved organisms as robotic controllers, which we did with both a Roomba Create robot and a Lego Mindstorms robot. We used a point light source to create a "resource peak" that the robot displayed phototropic behavior toward with the "climber" organisms loaded.

This isn't about adjusting weights of some existing model. This is about evolution creating algorithms that did not exist before, based on nothing more than having a resource to exploit, the ability to take a step, to change direction randomly, and to sense differences in the local environment. (Very local, just to the extent of where a move instruction would take the organism were it executed, and only so far as to give a relative difference, not an absolute number.) There's no reward system other than "organisms do better if they go where resources are more abundant". There's nothing in the system to prefer inclusion of the new instructions, and there's actually a cost associated with executing the "move" instruction. And yet, time and again, this system can produce effective methods in the provably optimal class of gradient ascent algorithms.
Posted by: OgreMkV on Jan. 22 2014,09:25

Wes,

This is very cool stuff. Are you going to publish this?

I ask, because I'd like to put it on my blog.  If you're going to publish, I'll wait.  But if you don't mind, I can write something up or if you want to write something up, I'd like to post this.

It's directly relevant to a discussion I'm having with another Meyer worshiper who doesn't realize that mutations can cause huge changes in populations.
Posted by: fnxtr on Jan. 22 2014,10:04

A little something for Gary, from B. Kliban:


Posted by: Wesley R. Elsberry on Jan. 22 2014,11:34

Quote (OgreMkV @ Jan. 22 2014,09:25)
Wes,

This is very cool stuff. Are you going to publish this?

I ask, because I'd like to put it on my blog.  If you're going to publish, I'll wait.  But if you don't mind, I can write something up or if you want to write something up, I'd like to post this.

It's directly relevant to a discussion I'm having with another Meyer worshiper who doesn't realize that mutations can cause huge changes in populations.
---------------------QUOTE-------------------


The linked paper is as published as that will be getting.

     

---------------------QUOTE-------------------

Elsberry, W.R.; Grabowski, L.M.; Ofria, C.; Pennock, R.T.  2009.  Cockroaches, drunkards, and climbers: Modeling the evolution of simple movement strategies using digital organisms.  SSCI: IEEE Symposium on Artificial Life, 2009: 92--99.

---------------------QUOTE-------------------



So anything out of that is fair game for comment.

There's a section on "future work" in the paper. What I'm hoping to do next is to move on to building environments with both a positive and a negative "resource", so that the organisms will need to evolve both appetitive and aversive responses. If that comes together quickly enough, a colleague has pointed out a conference submission deadline at the end of March.
Posted by: OgreMkV on Jan. 22 2014,23:10

I posted something. If I messed anything up, please let me know.  It's been a heck of a week.

I've never heard anyone puking on a conference call... until today.  Geez.
Posted by: Wesley R. Elsberry on Jan. 23 2014,03:17

While I did stress that the genomic content of the initial organism, and thus the Avidian population, could only acquire the new instructions via mutation, once an ancestral organism had one or more of those, they would be passed down to offspring with the usual frequency. And any effects they had on the organism could yield a difference in fitness, driving the usual selective processes. I think saying mutation was the only operative process goes too far. Not including the instructions in any way in the initial organism simply eliminates the possibility that I as experimenter set up a particular outcome by whatever arrangement of movement-relevant instructions might be set in that initial organism.

One question I was asked at SSCI in 2009 was why use Avida and not something like Echo. And while the efficient answer is that when one is at the Devolab, one is usually going to be using Avida, I did survey the available software at the time for applicability to the question I was looking at. The software systems allowing for agent movement all treated movement as a primitive property, often requiring some fixed movement strategy be defined for the agents a priori. I was interested in looking at what evolution could do given just the sort of capabilities underlying movement as seen in organisms like E. coli, but without specifying how those capabilities were used. And that kind of question was not what the other software packages could address.
Posted by: BillB on Jan. 23 2014,04:32

Quote (Wesley R. Elsberry @ Jan. 23 2014,09:17)
While I did stress that the genomic content of the initial organism, and thus the Avidian population, could only acquire the new instructions via mutation, once an ancestral organism had one or more of those, they would be passed down to offspring with the usual frequency. And any effects they had on the organism could yield a difference in fitness, driving the usual selective processes. I think saying mutation was the only operative process goes too far. Not including the instructions in any way in the initial organism simply eliminates the possibility that I as experimenter set up a particular outcome by whatever arrangement of movement-relevant instructions might be set in that initial organism.

One question I was asked at SSCI in 2009 was why use Avida and not something like Echo. And while the efficient answer is that when one is at the Devolab, one is usually going to be using Avida, I did survey the available software at the time for applicability to the question I was looking at. The software systems allowing for agent movement all treated movement as a primitive property, often requiring some fixed movement strategy be defined for the agents a priori. I was interested in looking at what evolution could do given just the sort of capabilities underlying movement as seen in organisms like E. coli, but without specifying how those capabilities were used. And that kind of question was not what the other software packages could address.
---------------------QUOTE-------------------


Excellent stuff, and something I'm really interested in despite having no time to work on any more ...

I'm not intimately familiar with Avida but a few things jumped to mind whilst reading the description:
     

---------------------QUOTE-------------------
A facing is always toward another adjacent grid cell, so for an interior grid cell there are eight legal facings, five legal facings on an edge, and three at each corner grid cell.
---------------------QUOTE-------------------



I would say that there should be no illegal facings, just an inability to move when facing an edge – this would prevent a bias towards movement back to the centre – A bit like breeding E. coli in a jar: They cannot pass through the glass container but they could repeatedly bump against it until they die. By having illegal facings you are, in one sense, providing them with obstacle avoidance behaviour for free.

     

---------------------QUOTE-------------------
The "sense-diff-facing" instruction puts the difference in the amount of a specified resource between the current grid cell and the faced grid cell into one of the Avida registers.
---------------------QUOTE-------------------



What if this was expanded to be a “sense X,Y diff” instruction where X and Y can be any of the surrounding cells, or your own cell? The values for X and Y would be heritable. (And I don't know what you do about sensing the cell in front of you when facing the edge of the world)

Perhaps if you wanted to add an interesting twist you could turn that into something like "Z=F(X,Y)" where X and Y are as described above but the function F is a heritable operand (Add, Subtract Multiply Divide or Modulo) - you might even include bit shifting as a possible operand? Z=X<<Y or Z=X>>Y

The point would be to provide multiple pathways for this sensory apparatus to work - and for it to fail to work.

Expanding on this a bit more (if it is worth doing) you could allow for more distal sensing - maybe a Z=F((A,B)(X,Y)) instruction where A and B, and X and Y, are relative cell co-ordinates, perhaps capped to a maximum range of +/- 5. If you did this then I would be tempted to add a cost for longer range sensing (You need more energy to grow those longer whiskers!)

     

---------------------QUOTE-------------------
The environment is defined with a positively rewarding resource, with a peak in the resource set off-center in the world grid.
---------------------QUOTE-------------------



Can you make this more complex and dynamic? Perhaps try something more akin to a simple hydrothermal vent model:

A source (of the resource) pops up at a random location and begins churning out the ‘resource’, creating a gradient. Eventually the source is exhausted and the gradient disappears. You can have a maximum of x sources in the world at any time and when the number of sources is less than x a new source has some probability of appearing at a new random location.

It would also be nice to have a negative resource – something that causes harm but which is not simply a lack of positive resource – using the same hydrothermal vent model you could have a second resource whose intensity costs or harms an agent. This should result in a much more interesting and dynamic resource landscape for the agents to navigate.

I'm not sure if this should be a sense-able resource (something the agent can sense) of if it just causes harm without the agent realising -- Something I'm not clear on with Avida: can the agent sense its own 'energy' and as a result tell if it is being rewarded or harmed?

I am tempted to suggest actually defining a spectrum of resources (some good, some bad) but this would require many more methods for the agent to sense them (and makes for a much more complex research project). What I am thinking of here (and it is a vague thought without any of the important details) is to include potential routes by which an agent can gain an advantage by combining certain resources in certain ratios – it can create a more potent energy source than the ones it absorbs passively – This would, of course, be balanced by the potential for agents to combine resources into fatal concoctions.
Posted by: Wesley R. Elsberry on Jan. 23 2014,09:41

Quote (BillB @ Jan. 23 2014,04:32)
     
Quote (Wesley R. Elsberry @ Jan. 23 2014,09:17)
While I did stress that the genomic content of the initial organism, and thus the Avidian population, could only acquire the new instructions via mutation, once an ancestral organism had one or more of those, they would be passed down to offspring with the usual frequency. And any effects they had on the organism could yield a difference in fitness, driving the usual selective processes. I think saying mutation was the only operative process goes too far. Not including the instructions in any way in the initial organism simply eliminates the possibility that I as experimenter set up a particular outcome by whatever arrangement of movement-relevant instructions might be set in that initial organism.

One question I was asked at SSCI in 2009 was why use Avida and not something like Echo. And while the efficient answer is that when one is at the Devolab, one is usually going to be using Avida, I did survey the available software at the time for applicability to the question I was looking at. The software systems allowing for agent movement all treated movement as a primitive property, often requiring some fixed movement strategy be defined for the agents a priori. I was interested in looking at what evolution could do given just the sort of capabilities underlying movement as seen in organisms like E. coli, but without specifying how those capabilities were used. And that kind of question was not what the other software packages could address.
---------------------QUOTE-------------------


Excellent stuff, and something I'm really interested in despite having no time to work on any more ...

I'm not intimately familiar with Avida but a few things jumped to mind whilst reading the description:
             

---------------------QUOTE-------------------
A facing is always toward another adjacent grid cell, so for an interior grid cell there are eight legal facings, five legal facings on an edge, and three at each corner grid cell.
---------------------QUOTE-------------------



I would say that there should be no illegal facings, just an inability to move when facing an edge – this would prevent a bias towards movement back to the centre – A bit like breeding E. coli in a jar: They cannot pass through the glass container but they could repeatedly bump against it until they die. By having illegal facings you are, in one sense, providing them with obstacle avoidance behaviour for free.

---------------------QUOTE-------------------



Avida giveth, and Avida taketh away. Facing is very basic to the software. Illegal facings, when exercised, terminate the program with an ugly "bus error" message.

On the other hand, the world geometry options are (or I should say "were", I haven't checked the latest code) grid, torus, and clique. I have no idea what clique does. Torus, though, wraps the edges of the world grid. Using torus would solve the illegal facing issue, since every cell would then be an interior cell. However, I also thought of torus as giving Avidians something for free, since on a relatively prime grid size I think movement on the diagonal will give the organism access to a lot of the grid, if not all of it.

   
Quote (BillB @ Jan. 23 2014,04:32)

             

---------------------QUOTE-------------------
The "sense-diff-facing" instruction puts the difference in the amount of a specified resource between the current grid cell and the faced grid cell into one of the Avida registers.
---------------------QUOTE-------------------



What if this was expanded to be a “sense X,Y diff” instruction where X and Y can be any of the surrounding cells, or your own cell? The values for X and Y would be heritable. (And I don't know what you do about sensing the cell in front of you when facing the edge of the world)

---------------------QUOTE-------------------



As I recall it, access to adjoining cells is entirely defined by facing. It would be nice to have X,Y addressable during the run, but as I recall, it doesn't work that way.

I think this issue, among others, led a colleague of mine to give up on modifying the Avida grid system entirely, and instead implemented a separate arena-style system that was instantiated on a per-organism basis, what she referred to in the planning stages as "dream-a-grid". Many of the things that I am describing as constraints would not be in her codebase. (Her Avidians evolved such things as perfect maze-running, but she had a complex system of markers that when correctly sensed and acted upon would lead to that.) The tradeoff, though, is that her movement experiments were all about individual performances, and no interaction between members of the population would be possible. I'm thinking in terms of future experiments possibly having a larger role for competition.

   
Quote (BillB @ Jan. 23 2014,04:32)

Perhaps if you wanted to add an interesting twist you could turn that into something like "Z=F(X,Y)" where X and Y are as described above but the function F is a heritable operand (Add, Subtract Multiply Divide or Modulo) - you might even include bit shifting as a possible operand? Z=X<<Y or Z=X>>Y

The point would be to provide multiple pathways for this sensory apparatus to work - and for it to fail to work.

Expanding on this a bit more (if it is worth doing) you could allow for more distal sensing - maybe a Z=F((A,B)(X,Y)) instruction where A and B, and X and Y, are relative cell co-ordinates, perhaps capped to a maximum range of +/- 5. If you did this then I would be tempted to add a cost for longer range sensing (You need more energy to grow those longer whiskers!)

---------------------QUOTE-------------------



There was already code in Avida for distinguishing resources. This was based on a label system, where several bases in the genome get interpreted as a label, so what the organism gets when it processes a sensory instruction is heritable. All the sensory instruction does is put a value into an Avidian CPU register. What happens to it after that has to evolve, too.

Like I said above, I don't know that distant sensing has an obvious implementation pathway.

   
Quote (BillB @ Jan. 23 2014,04:32)

             

---------------------QUOTE-------------------
The environment is defined with a positively rewarding resource, with a peak in the resource set off-center in the world grid.
---------------------QUOTE-------------------



Can you make this more complex and dynamic? Perhaps try something more akin to a simple hydrothermal vent model:

A source (of the resource) pops up at a random location and begins churning out the ‘resource’, creating a gradient. Eventually the source is exhausted and the gradient disappears. You can have a maximum of x sources in the world at any time and when the number of sources is less than x a new source has some probability of appearing at a new random location.

---------------------QUOTE-------------------



The current way I define a resource gradient is quite cumbersome. I have a Perl script that set up CELL declarations in the environment config for every cell in the grid. I do have code for a method to establish a resource gradient at runtime, but that's not yet tested. Yes, I'd like to have a moving resource at some point. I don't think it will be the first thing out the gate.

   
Quote (BillB @ Jan. 23 2014,04:32)

It would also be nice to have a negative resource – something that causes harm but which is not simply a lack of positive resource – using the same hydrothermal vent model you could have a second resource whose intensity costs or harms an agent. This should result in a much more interesting and dynamic resource landscape for the agents to navigate.

I'm not sure if this should be a sense-able resource (something the agent can sense) of if it just causes harm without the agent realising -- Something I'm not clear on with Avida: can the agent sense its own 'energy' and as a result tell if it is being rewarded or harmed?

---------------------QUOTE-------------------



The "detrimental resource" is likely the first thing out the gate. There's some issues on how this gets implemented, but I think I see a way forward on that that won't impact what I've already done too much.

As far as the Avidians sensing whether they are doing well or not, I think the answer is "no". The system scheduler assigns cycles based on merit, so poorly performing Avidians are also slowly performing Avidians. As far as I know, permitting an Avidian to have access to some transformation of its own merit would require setting up an instruction to do just that. Plus, an absolute value for merit wouldn't be terribly useful. In the first hundred updates, a merit of 0.29 would be excellent, but then pretty miserable not so much further into the run. What would be useful to the Avidian is some relative number related to their ranking in the population. I don't know of any biological correlate to that, though.

   
Quote (BillB @ Jan. 23 2014,04:32)

I am tempted to suggest actually defining a spectrum of resources (some good, some bad) but this would require many more methods for the agent to sense them (and makes for a much more complex research project). What I am thinking of here (and it is a vague thought without any of the important details) is to include potential routes by which an agent can gain an advantage by combining certain resources in certain ratios – it can create a more potent energy source than the ones it absorbs passively – This would, of course, be balanced by the potential for agents to combine resources into fatal concoctions.
---------------------QUOTE-------------------



Actually, the sensing system is already label-based, so multiplying the resources could be done without any particular hassle for the programmer. What it would do to the Avidians... that's an experiment.

The first experiment was pretty much a stab in the dark. We set up something that hadn't been tried, and we didn't know whether we were posing a challenge outside the scope of what could be evolved in Avida. Now that we know that Avidians can evolve movement strategies, including ones in an optimal class of strategies, we can raise the bar some.
Posted by: BillB on Jan. 23 2014,10:51

Thanks for the detailed reply. I think I might try and find some time to familiarise myself with Avida ... I have, in the past, sketched out a framework for doing experiments like this myself but based on what you have written I think my system would not be possible to impliment in Avida.

I might go back over my early notes and see if there is anything sensible I can summarise here.

A thought occured after my last post - would it be possible to have the resource inflict a penalty when it is above a certain level? Think of it like a nutrient rich gradient coming from a hydrothermal vent - if you get too close to the source you literaly start to cook. There would be an optimal distance (A habitable zone?), and I would expect to see the resulting pattern of activity to appear as a ring rather than a point (referring to your plots above)
Posted by: BillB on Jan. 24 2014,11:43

< this >caught my eye - a thermodynamic theory of the origin of life. Reminds me of the Maximum Entropy Production Principle.
Posted by: Henry J on Jan. 24 2014,14:21

Quote (BillB @ Jan. 24 2014,10:43)
< this >caught my eye - a thermodynamic theory of the origin of life. Reminds me of the Maximum Entropy Production Principle.
---------------------QUOTE-------------------


So he thinks that abiogenesis is caused by some sort of positive feedback loop among systems that redistribute incoming energy?
Posted by: Wesley R. Elsberry on April 18 2014,10:02

Physicist Sean Devine critiques Dembski's "design inference" < in Zygon >.

Jeff Shallit and I are cited extensively.
Posted by: midwifetoad on April 18 2014,11:27

Quote (Wesley R. Elsberry @ April 18 2014,10:02)
Physicist Sean Devine critiques Dembski's "design inference" < in Zygon >.

Jeff Shallit and I are cited extensively.
---------------------QUOTE-------------------




---------------------QUOTE-------------------
The fundamental choice to be made, given the available information, is not whether chance provides a better explanation than design, but whether natural laws provide a better explanation than a design.
---------------------QUOTE-------------------



Sig worthy
Posted by: Quack on April 19 2014,02:54



---------------------QUOTE-------------------
Sig worthy
---------------------QUOTE-------------------


Indeed, but who am I qouting?
Posted by: midwifetoad on April 19 2014,14:48

Quote (Quack @ April 19 2014,02:54)


---------------------QUOTE-------------------
Sig worthy
---------------------QUOTE-------------------


Indeed, but who am I qouting?
---------------------QUOTE-------------------


Physicist Sean Devine critiques Dembski's "design inference" in Zygon.
Posted by: DiEb on Sep. 28 2014,13:14

William Dembski held a talk at the University of Chicago in August 2014. There is a < youtube video > of this one hour long talk which the Discovery Institute provided.

At the moment, I'm trying to transcribe this video in a series of posts on my blog: < William Dembski's talk at the University of Chicago >.

At 45', I had to stop for a moment, as there is such an amusing elementary mistake on the slides which the eminent Dr. Dr. uses... And, as 1/2 != 3/5, Dr. Dr. Dembski gets as wrong result - which is, as he says - " typical for these search-for-a-search situations". I couldn't agree more.

Please, take a look at: < Conservation of Information in Evolutionary Search - Talk by William Dembski - part 4 >. Thanks :-)


Posted by: Henry J on Sep. 28 2014,18:30

Yeah, but he doesn't have to stoop to your pathetic level of detail!

Or something.
Posted by: fusilier on Sep. 29 2014,06:44

Quote (DiEb @ Sep. 28 2014,14:14)
William Dembski held a talk at the University of Chicago in August 2014. There is a < youtube video > of this one hour long talk which the Discovery Institute provided.

At the moment, I'm trying to transcribe this video in a series of posts on my blog: < William Dembski's talk at the University of Chicago >.

At 45', I had to stop for a moment, as there is such an amusing elementary mistake on the slides which the eminent Dr. Dr. uses... And, as 1/2 != 3/5, Dr. Dr. Dembski gets as wrong result - which is, as he says - " typical for these search-for-a-search situations". I couldn't agree more.

Please, take a look at: < Conservation of Information in Evolutionary Search - Talk by William Dembski - part 4 >. Thanks :-)
---------------------QUOTE-------------------


Is that "one over two-factorial equals three over five"  or "one over two does not equal three over five?"

Either way it's not sensible, but ....

=B^0
Posted by: k.e.. on Sep. 30 2014,09:51

Quote (fusilier @ Sep. 29 2014,14:44)
Quote (DiEb @ Sep. 28 2014,14:14)
William Dembski held a talk at the University of Chicago in August 2014. There is a < youtube video > of this one hour long talk which the Discovery Institute provided.

At the moment, I'm trying to transcribe this video in a series of posts on my blog: < William Dembski's talk at the University of Chicago >.

At 45', I had to stop for a moment, as there is such an amusing elementary mistake on the slides which the eminent Dr. Dr. uses... And, as 1/2 != 3/5, Dr. Dr. Dembski gets as wrong result - which is, as he says - " typical for these search-for-a-search situations". I couldn't agree more.

Please, take a look at: < Conservation of Information in Evolutionary Search - Talk by William Dembski - part 4 >. Thanks :-)
---------------------QUOTE-------------------


Is that "one over two-factorial equals three over five"  or "one over two does not equal three over five?"

Either way it's not sensible, but ....

=B^0
---------------------QUOTE-------------------


Didn't you mean =BS^0 ?
Posted by: fusilier on Sep. 30 2014,20:14

Quote (k.e.. @ Sep. 30 2014,10:51)
Quote (fusilier @ Sep. 29 2014,14:44)
Quote (DiEb @ Sep. 28 2014,14:14)
William Dembski held a talk at the University of Chicago in August 2014. There is a < youtube video > of this one hour long talk which the Discovery Institute provided.

At the moment, I'm trying to transcribe this video in a series of posts on my blog: < William Dembski's talk at the University of Chicago >.

At 45', I had to stop for a moment, as there is such an amusing elementary mistake on the slides which the eminent Dr. Dr. uses... And, as 1/2 != 3/5, Dr. Dr. Dembski gets as wrong result - which is, as he says - " typical for these search-for-a-search situations". I couldn't agree more.

Please, take a look at: < Conservation of Information in Evolutionary Search - Talk by William Dembski - part 4 >. Thanks :-)
---------------------QUOTE-------------------


Is that "one over two-factorial equals three over five"  or "one over two does not equal three over five?"

Either way it's not sensible, but ....

=B^0
---------------------QUOTE-------------------


Didn't you mean =BS^0 ?
---------------------QUOTE-------------------


touche!
Posted by: Wesley R. Elsberry on Mar. 06 2015,12:32

Is there such a thing as an open-source Javascript webapp and GUI framework with RAD visual designer (doing for Javascript what Delphi does for Object Pascal or Visual Studio does for C#)?

My sole foray into Javascript so far has been a simple DOM thing to implement Weasel and display results in an HTML form. Now there looks to be a plethora of libraries, but little of the info oriented along the lines of interest in my question.
Posted by: k.e.. on Mar. 06 2015,21:45

Quote (Wesley R. Elsberry @ Mar. 06 2015,20:32)
Is there such a thing as an open-source Javascript webapp and GUI framework with RAD visual designer (doing for Javascript what Delphi does for Object Pascal or Visual Studio does for C#)?

My sole foray into Javascript so far has been a simple DOM thing to implement Weasel and display results in an HTML form. Now there looks to be a plethora of libraries, but little of the info oriented along the lines of interest in my question.
---------------------QUOTE-------------------


I've played with Netbeans and Eclipse but you might be interested in Processing (processing.org) check this out

< http://www.openprocessing.org/sketch....255 >
Posted by: Wesley R. Elsberry on April 24 2016,23:07

Avida-ED is pretty close to going to a web application. Currently in alpha test, the new version presents the familiar user interface in your browser. This is made possible by applying the Emscripten compiler to turn the Avida core into a Javascript library. It should be generally available sometime in mid-June.
Posted by: Wesley R. Elsberry on June 19 2016,00:17

Avida-ED Web 3.0 is out for beta release now. Rob Pennock gave a formal announcement at the Evolution 2016 meeting.

< Avida-ED Web access page >

< Mirror >
Posted by: midwifetoad on June 19 2016,05:49

Doesn't seem to work on iPad Chrome.
Posted by: Wesley R. Elsberry on June 19 2016,07:58

Tablet compatibility wasn't the goal for this phase, but will be addressed later. The Avida-ED team is planning to get a ChromeBook and tablets for testing.

I know it works on Chrome on Android 5.1, which is what my phone is. That's not a great use case, but being able to show it at all there is pretty cool for a demo.
Posted by: Wesley R. Elsberry on June 19 2016,08:04

Diane's MacBook Pro that she does the UI development on had issues. When we were at the Apple Store in Tampa for repairs to it, we brought up Avida-ED on an iPad and an iPod there. Probably in Safari, which is not a good fit now because Safari refuses to allow saving blob data to local storage. Fixing that has been in the Safari developers issue list for a long time now.
Posted by: midwifetoad on Sep. 20 2016,19:26

Anyone have any info about the whereabouts of Zach, or where one could get a copy of Gregor's Bookkeeper?
Posted by: midwifetoad on Sep. 20 2016,20:59

Found it on archive.com
Posted by: WebHopper on Feb. 10 2017,11:32

Hello,

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabet
length of the target sequence
mutation rate
population size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical  but analytical solution of the mean and variance as a functions of the above-mentioned parameters.
Posted by: Wesley R. Elsberry on Feb. 10 2017,18:30

Quote (WebHopper @ Feb. 10 2017,11:32)
Hello,

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabet
length of the target sequence
mutation rate
population size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical  but analytical solution of the mean and variance as a functions of the above-mentioned parameters.
---------------------QUOTE-------------------


Further up the thread, I pointed out the difference between partitioned search and "weasel".


< http://www.antievolution.org/cgi-bin....y140986 >

     

---------------------QUOTE-------------------


"Locking" or "latching" is the same as removing the term that allows for correct bases to mutate to incorrect ones. What remains is an expectation that the number of correct bases can only monotonically increase.  


---------------------QUOTE-------------------



If you have the analytical form you like for partitioned search, then modify to add in the additional element I note for "weasel" and any other adjustments. I left off that project before fully working up the population component for probabilities.

I've had some issues with hosted images going stale. I need to look up some of my graphs in this thread and restore them.

< http://austringer.net/wp....-....-part-1 >

< http://austringer.net/wp....-....-part-2 >

And somewhere, sometime, I know I did a numerical scan of parameter space to show the likely range of parameters for Dawkins' original runs given his reported generation times for results. I'm not finding where I shared that, though.
Posted by: DiEb on Feb. 11 2017,03:02

Quote (WebHopper @ Feb. 10 2017,17:32)
Hello,

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabet
length of the target sequence
mutation rate
population size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical  but analytical solution of the mean and variance as a functions of the above-mentioned parameters.
---------------------QUOTE-------------------


Try  < "The probability of formation of a nucleotide sequence by random mutations" > by Ulrich Utiger...
Posted by: WebHopper on Feb. 11 2017,05:27

Wesley, if I understand you well the expected correct bases after mutation is

E = C + (u * (L - C) / K) - (u * C * (K - 1) / K)

where
C = number of correct bases
K = alphabet length
L = number of base pairs
u = mutation rate

I guess with "expected" E you refer to the mean. So if C=L we have

E = L (K + u - K u)/K

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...
Posted by: DiEb on Feb. 11 2017,06:26

Quote (WebHopper @ Feb. 11 2017,11:27)
DiEb: Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------


Well, there are three different approaches:

1) Simulation

2) Numerical Modelling

3) Analytical Modelling

Simulation is the easiest way, but doesn't provide much insights. Unless there aren't some clever simplifications/estimation/approximations, the analytical way gets just to complicated for me...

I've preferred to model the weasel as a Markov-Chain, given the mutation rate, the alphabet, the size of the population and the length of the target phrase, \mu and \sigma can be calculated in a straightforward was - and one gets some nice pictures:

< Which Parameters should Dembski have used? >


Posted by: WebHopper on Feb. 11 2017,09:51

The next configuration of a Markov chain only depends on the actual configuration and on neighboring sites, right? But the probability of a nucleotide sequence to achieve a target depends on all sites, that is, all sites must be correct. For example the third site of the sequence AAT is not correct if the target is AAA even though the first and second site are correct. So whether the target is achieved or not depends on all three sites. But maybe I don't understand exactly what is a Markov chain. Can you explain?

Can you also explain what is on the abscissa and ordinate in your graphic? And what mean the lines? Some constant parameters?
Posted by: Wesley R. Elsberry on Feb. 11 2017,16:30

Quote (WebHopper @ Feb. 11 2017,05:27)
Wesley, if I understand you well the expected correct bases after mutation is

E = C + (u * (L - C) / K) - (u * C * (K - 1) / K)

where
C = number of correct bases
K = alphabet length
L = number of base pairs
u = mutation rate

I guess with "expected" E you refer to the mean. So if C=L we have

E = L (K + u - K u)/K

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------


Edit: I'll look this over again. I derived it with Monte Carlo ground-truthing, so I'm not sure where we are diverging in expectation.


Posted by: Wesley R. Elsberry on Feb. 12 2017,14:28

Quote (Wesley R. Elsberry @ Feb. 11 2017,16:30)
 
Quote (WebHopper @ Feb. 11 2017,05:27)
Wesley, if I understand you well the expected correct bases after mutation is

E = C + (u * (L - C) / K) - (u * C * (K - 1) / K)

where
C = number of correct bases
K = alphabet length
L = number of base pairs
u = mutation rate

I guess with "expected" E you refer to the mean. So if C=L we have

E = L (K + u - K u)/K

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------


Edit: I'll look this over again. I derived it with Monte Carlo ground-truthing, so I'm not sure where we are diverging in expectation.
---------------------QUOTE-------------------


I think the issue is that the expectation I derived in the equation is the expected number of correct bases in one possibly-mutated offspring. I have rechecked the equation with a new Monte Carlo analysis, and it checks out.



---------------------CODE SAMPLE-------------------

u 0.0357142857143
# correct, MC result, equation result, partitioned search result
0 0.0368 0.0370 0.0370
1 1.0028 1.0013 1.0357
2 1.9652 1.9656 2.0344
3 2.9311 2.9299 3.0331
4 3.8931 3.8942 4.0317
5 4.8590 4.8585 5.0304
6 5.8222 5.8228 6.0291
7 6.7859 6.7870 7.0278
8 7.7504 7.7513 8.0265
9 8.7139 8.7156 9.0251
10 9.6783 9.6799 10.0238
11 10.6419 10.6442 11.0225
12 11.6067 11.6085 12.0212
13 12.5706 12.5728 13.0198
14 13.5344 13.5370 14.0185
15 14.5018 14.5013 15.0172
16 15.4639 15.4656 16.0159
17 16.4264 16.4299 17.0146
18 17.3926 17.3942 18.0132
19 18.3574 18.3585 19.0119
20 19.3230 19.3228 20.0106
21 20.2867 20.2870 21.0093
22 21.2499 21.2513 22.0079
23 22.2203 22.2156 23.0066
24 23.1779 23.1799 24.0053
25 24.1475 24.1442 25.0040
26 25.1056 25.1085 26.0026
27 26.0730 26.0728 27.0013
28 27.0409 27.0370 28.0000

---------------------CODE SAMPLE-------------------



Note that by the point we are mutating a string with 2 correct bases, we *expect* fewer than 2 correct bases afterward.

Partitioned search mutation expectation is always greater than or equal to the starting correct number of bases.

When we start talking about the expected number of generations to increment the best organism in the population to having another correct base, yes, that ends up in an exponentially increasing series with increasing number of correct bases. But the mutation expectation per offspring is a simpler calculation than that.


Posted by: WebHopper on Feb. 12 2017,16:50

Sorry Wesley, but I don't understand a clue what you are talking about...

I started to read Utiger's paper, he explains it quite well. I mean what we need is an equation for the mean number of generations necessary to achieve a target. For instance, for Dawkins' weasel sentence this number is around 60 or so for a population size of 100 and a mutation rate of 0.05 as explained on Wiki.

Utiger found a distribution like that of throwing dices:

P(v) = q^v-1 p^v

where v is the number of generations and p = 1-q is the probability that the dice got the correct number. When several nucleotides and a population size greater than one is involved, p and q become matrices with the same dimension as the length of the sequence. The mean is calculated in the same manner than for dices. This way, Utiger found that the mean is a logarithmic law with respect to the sequence length if the population size is greater than one, otherwise it is exponential. He checks this with Monte Carlo simulations and both the analytical and numerical results perfectly fit.
Posted by: DiEb on Feb. 12 2017,17:52

Quote (WebHopper @ Feb. 11 2017,15:51)
Can you also explain what is on the abscissa and ordinate in your graphic? And what mean the lines? Some constant parameters?
---------------------QUOTE-------------------


The graph answers the question "What is the expected number of queries for Dawkins's original weasel?"

That is, the size of the alphabet (27) and the length of the target (28) are fixed.

On the x-axis, there is the mutation rate \mu, on the y-axis, the number of queries, i.e., the size of the population times the number of expected generations.

The coloured lines show the relation between \mu and number of queries for certain certain sizes of population. The dot displays the minimal expected number of queries for a certain population size - the black text enlists the relevant information:

1) size of population (e.g., 2 for the red line)
2) most efficient rate of mutation (e.g., 0.000 049 for the read line)
3) number of expected queries of this combination of size and rate (e.g., 21,213)

The black line connects the minimal points and allows for extrapolation - though this is quite difficult in this log-log diagram.

When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).


Posted by: WebHopper on Feb. 13 2017,03:53

Quote (DiEb @ Feb. 12 2017,17:52)
When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).
---------------------QUOTE-------------------


If I understand you well, a pop. size of 9 and mut. rate of 0.00901 yields a minimal average number of queries?  

There is an optimal mut. rate. But there is no optimal pop. size yielding a minimal number of trials. So the higher the pop. size, the lower the number of trials.

Maybe it would be more useful if you only had the expected number of trials on the y-axis. Why take the number of queries?
Posted by: DiEb on Feb. 13 2017,05:40

Quote (WebHopper @ Feb. 13 2017,09:53)
Quote (DiEb @ Feb. 12 2017,17:52)
When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).
---------------------QUOTE-------------------


If I understand you well, a pop. size of 9 and mut. rate of 0.00901 yields a minimal average number of queries?  

There is an optimal mut. rate. But there is no optimal pop. size yielding a minimal number of trials. So the higher the pop. size, the lower the number of trials.

Maybe it would be more useful if you only had the expected number of trials on the y-axis. Why take the number of queries?
---------------------QUOTE-------------------


Frankly, I'm not sure what you mean by "trial"...

The general idea is that one wishes to reduce the costs of a simulation: you can define the idea of costs in various ways - maybe there is a restriction to the size of a generation and / or the number of generations - but it is standard to define the number of calls to the oracle / evaluations of the fitness function as the cost of a program. That's why I displayed the number of queries, i.e., the number of individuals created for which the fitness function has to be evaluated.

The optimal mutation rate depends on the size of the population - there is no overall optimum for all sizes!
Posted by: WebHopper on Feb. 14 2017,06:45

Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
Posted by: DiEb on Feb. 14 2017,12:16

Quote (WebHopper @ Feb. 14 2017,12:45)
Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
---------------------QUOTE-------------------


Great - I will try to find out where my error laid...
Posted by: WebHopper on Feb. 14 2017,12:55

Quote (WebHopper @ Feb. 12 2017,16:50)
Utiger found a distribution like that of throwing dices:

P(v) = q^v-1 p^v

where v is the number of generations and p = 1-q is the probability that the dice got the correct number.
---------------------QUOTE-------------------



Sorry for this error. The probability distribution for throwing a dice is of course

P(v) = q^(v-1) p

with p=1/6.

You should definitely read Utiger's article even though his conclusion is that natural selection does not work and you guys believe in the exact opposite... But he makes it with honesty and clarity. Furthermore, he does not draw any conclusions in favor of intelligent design or any other creation beliefs. He just says that there is disagreement between the empirical data of 13 million years ago when the split between the Pan and Homo genera occurred and the data furnished by the Weasel-algorithm, that is, billions of years ago...

< The paper can also be found on ResearchGate. >
Posted by: k.e.. on Feb. 14 2017,21:16

If you really want to get a grip on the power and limitations of statistics I suggest Nate Silver's "The Signal and The Noise". My favorite quote: "Economists have predicted nine of the last six recessions". Berlinkski famously conflated  the mathematical meaning of the word limit with the limit of actions available in the physical world. Personal incredulity unfortunately is a limit even for Mathemeticians. If his work was truly useful he could see if the strategy worked in Poker. I predict the results would see him lose. I have yet to see any mathematician change the outcome of a dice rolled in the past.
Posted by: Wesley R. Elsberry on Feb. 14 2017,21:42

I haven't gone through all of Utiger's math, but I can speak some to the biology interactions.

Utiger attributes all change to natural selection. His paper is devoid of references to drift, and his only mention of Kimura is an offhand reference to mathematics. Thus, you can readily dismiss any conclusions he makes about plausibility of genetic change; he is not dealing with the full model of how genetics changes.

Referencing Meyer as an authority on genetics is always a bad sign.
Posted by: Wesley R. Elsberry on Feb. 15 2017,12:06

Quote (DiEb @ Feb. 14 2017,12:16)
 
Quote (WebHopper @ Feb. 14 2017,12:45)
 
Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
---------------------QUOTE-------------------


Great - I will try to find out where my error laid...
---------------------QUOTE-------------------


Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.
Posted by: DiEb on Feb. 15 2017,17:26

Quote (Wesley R. Elsberry @ Feb. 15 2017,18:06)
Quote (DiEb @ Feb. 14 2017,12:16)
   
Quote (WebHopper @ Feb. 14 2017,12:45)
   
Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
---------------------QUOTE-------------------


Great - I will try to find out where my error laid...
---------------------QUOTE-------------------


Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.
---------------------QUOTE-------------------


Nevertheless, I'll look into it over the next week. Heck, I liked the picture, but something seems to be off.

I run a few simulations: WebHopper seems to be correct! Now, I have to go over eight year old code - from when I was young and pretty ;-)
Posted by: DiEb on Feb. 16 2017,10:04

Quote (DiEb @ Feb. 15 2017,23:26)
 
Quote (Wesley R. Elsberry @ Feb. 15 2017,18:06)
   
Quote (DiEb @ Feb. 14 2017,12:16)
       
Quote (WebHopper @ Feb. 14 2017,12:45)
       
Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
---------------------QUOTE-------------------


Great - I will try to find out where my error laid...
---------------------QUOTE-------------------


Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.
---------------------QUOTE-------------------


Nevertheless, I'll look into it over the next week. Heck, I liked the picture, but something seems to be off.

I run a few simulations: WebHopper seems to be correct! Now, I have to go over eight year old code - from when I was young and pretty ;-)
---------------------QUOTE-------------------


ARRRRGH! I don't know how I have managed it, but I uploaded two wrong pictures! Those belong to a blogpost from Oct 3, 2009 on < Dembski's "Random Search" > which had a target length of 100, and an alphabet of size 2 only!

The day before I wrote about < Dawkin's weasel > - here are the correct pictures for Dawkins:

The number of queries:


The number of generations:


i) for a population size of 100 and a mutation rate of 0.044, I get an expected number of generations of 78.47

ii) I still think that the number of individuals created best reflects the cost of the algorithm - a least mathematically: the onerous task is to create/mutate a child, using random numbers - in this sense, 10 generations of 10 children are equally expensive as one generation of 100 children.


Posted by: Wesley R. Elsberry on Feb. 16 2017,11:32

Dieb:



---------------------QUOTE-------------------

ii) I still think that the number of individuals created best reflects the cost of the algorithm - a least mathematically: the onerous task is to create/mutate a child, using random numbers - in this sense, 10 generations of 10 children are equally expensive as one generation of 100 children.

---------------------QUOTE-------------------



Yes, that is absolutely standard.
Posted by: Wesley R. Elsberry on Feb. 21 2017,22:37

Dieb,

I did the empirical approximation thing some time ago with "weasel" to plot a measure of average number of candidates across a swath of parameter space for population size and mutation rate:

< http://www.antievolution.org/cgi-bin....y142675 >
Posted by: Wesley R. Elsberry on Feb. 21 2017,22:45

And a bit of forensics on the unknown parameters Richard Dawkins used for his 1986 book and his 1987 video featuring 'weasel':

< http://www.antievolution.org/cgi-bin....y142694 >
Posted by: DiEb on Feb. 28 2017,02:32

Quote (Wesley R. Elsberry @ Feb. 22 2017,04:45)
And a bit of forensics on the unknown parameters Richard Dawkins used for his 1986 book and his 1987 video featuring 'weasel':

< http://www.antievolution.org/cgi-bin....y142694 >
---------------------QUOTE-------------------


Thanks for the links - it was fun to revisit your posts! Wow, time flies....
Posted by: Wesley R. Elsberry on May 02 2017,20:10

Sometime back, I had another look at a "weasel" post on my blog, and checked out an antievolutionist's blog where I had left a comment. I found he had < responded > about two months after I left my message. It took years for me to see it.

But it also goaded me into doing some more to describe "weasel" math right on up to incorporating population-level probabilities. I'll be putting this up as a blog post soon, but I'll put some of the goodies here first.

So...

If you generate a population of organisms randomly, what are the odds that at least one organism will have at least one match to the target?

C = # of correct bases
L = # of bases in organism genome
K = # of separate characters in the pool
N = population size
u = mutation rate (may be expressed as a fraction with L)

We need the p(org_random(C>=1)), the probability that a organism with a randomly assigned genome will have one or more matches to the target. We can get that by using the probability that a randomly assigned base is not a match, raise that to the power of the number of bases, and take the complement.

p(org_random(C>=1)) = (1.0 - (1.0 - 1/K)**L)

For "weasel", K = 27 and L = 28 and p(org_random(C>=1)) = 0.652

Getting the population probability is straightforward.

p(pop_random(C>=1)) = 1.0 - ((1.0 - 1/K)**L)**N

So how does this go, over, say, populations of size from 1 to 10? Like this:

p(pop_random(C>=1)) N= 1 p = 0.652
p(pop_random(C>=1)) N= 2 p = 0.879
p(pop_random(C>=1)) N= 3 p = 0.958
p(pop_random(C>=1)) N= 4 p = 0.985
p(pop_random(C>=1)) N= 5 p = 0.995
p(pop_random(C>=1)) N= 6 p = 0.998
p(pop_random(C>=1)) N= 7 p = 0.999
p(pop_random(C>=1)) N= 8 p = 0.9998
p(pop_random(C>=1)) N= 9 p = 0.9999
p(pop_random(C>=1)) N= 10 p = 0.99997

And there is a graph.



Some more graphs...

This one is the graph of the probability that at least one daughter organism will have more correct bases than its parent given population size, mutation rate (at u=1/L), and number of correct bases.



This shows how increasing the population size makes a huge difference in the probability of progress in each generation.

And there is the ever-popular notion of "latching". I should note that what cumulative selection is about is showing that you can have the retention of beneficial changes without having a system that privileges any particular portion of the genome. It's that specific privilege part that is what is so wildly misleading about claims of latching.

I have two graphs for this. Both are with u=1/L, that is, an expectation of mutation of one base per replication. There are points plotted in red for my direct calculation of probability, and blue for the Monte Carlo (MC) estimate of the probability. For runs with smaller population sizes, there are ten thousand MC trials per point, and for the runs including larger population sizes, there are a thousand MC trials per point.

These probability graphs show the probability that one would see a stepback in fitness for a particular population size, mutation rate, and number of correct bases in the parent. These demonstrate that for a reasonable mutation rate, even tiny population sizes can reduce the expectation of observing a stepback in fitness to very slim probabilities. As population sizes reach anything near a reasonable size, those probabilities fall very close to zero.




Posted by: Dr.GH on May 04 2017,00:09

I should never forget population size as a massive factor. But I seem to do that.
Posted by: Wesley R. Elsberry on Feb. 21 2018,20:37

Disclosure: I did my postdoc working with Avida, I worked on the Avida-ED project, and my spouse is the architect and programmer on the current web-based Avida-ED.

< Science Daily had an article > discussing a research paper on the educational effectiveness of using < Avida-ED >, the < award-winning software > putting a user-friendly interface on the Avida resarch platform, to teach students basic concepts in evolutionary biology. Other recent notice came in a page at the < IEEE 360Electronics site >.

From the Results section of the research paper:

         

---------------------QUOTE-------------------

Student Acceptance of Evolution

Average student acceptance scores across all ten cases ranged from 173.95 to 90.04 on the pre-test and 76.28 to 91.06 on the post-test, or moderate to very high acceptance for both pre- and post-tests, using Rutledge’s (1996) guide to interpretation (Table 8). Average acceptance score increased significantly from pre- to post-test in four of the ten cases (Fig. 5). These four cases were all lowerdivision courses that also had statistically significant gains in average content score. Students in two of the three upper-division evolution courses, B_300Evo and F_400Evo, had very high acceptance on both the pre- and post-tests, with no significant change. These were the same two upper-division courses in which the highest pre-test content scores were observed. Students in the remaining upper-division course, C_400Evo, also did not show a significant change in acceptance from pre- to post-test, with the lowest average acceptance score on the post-test (76.28). Thus, case C_400Evo showed the lowest average scores for both content and acceptance on the post-test, despite being a senior-level evolution course (discussed below).

Understanding and Acceptance of Evolution

Most of the students in lower-division courses had significant increases in both average content and average acceptance scores, suggesting a relationship between the two. Again, we accounted for differences in levels of student prior knowledge by using normalized gains (g-avg; Hake, 2002), calculated for each student’s pre- and post-assessment scores, which were then averaged for each case. The Pearson correlation confirmed a significant, positive association between the change in average normalized content score and in average normalized acceptance score across the ten cases (r = 0.60, p < 0.05; Fig. 6).

---------------------QUOTE-------------------



Of course, whenever evolutionary computation gets favorable notice, you can count on the < Discovery Institute to say "Pffff-ff-ff-fft" to that >.

         

---------------------QUOTE-------------------

It’s time for a bit of honesty in evolution education! Avida shows that evolutionary processes require intelligent design to hit predetermined targets. That’s the candid takeaway from a lesson about this software. Since we don’t recommend trying to bring ID into public school classrooms, there are undoubtedly more effective uses of class time than playing with Avida-ED.

---------------------QUOTE-------------------



Well, predetermined targets are one thing, and actually "cheating", as Dembski has routinely called it, is quite another. There is a finite set of logic operations that can operate on one or two inputs. Avida has all of them implemented such that they could be specified and recognized if a program accomplished any of them. The Avida results that Dembski and others concern themselves with are a small sample, nine tasks, of that larger number. The 2003 Lenski et al. paper is based on the "logic-nine" definition of an environment.

Other environments are possible. By other, I mean something like 8.511301e10 of them or more when considered with nine selected tasks per environment. Choose other numbers of tasks and you'll different numbers, but mostly larger. Avida doesn't care which ones you specify, which makes it difficult to credit that the information of "intelligently designed" results for each of them is somehow crammed into the Avida executable, which the last time I checked weighed in at about 2e6 bytes.

During my postdoc, I used Avida, but here is the number of the existing predefined logic tasks I used in < my research >: 0. I added three instructions to the Avida instruction set, one to move the Avidian from its cell to an adjacent cell, one to randomly change which way it faced, and one that would load the difference in a resource between the current cell and the faced cell into a register. A task placeholder, "don't-care", got no merit bonus of its own; instead, Avidians received merit modulated by the concentration of resource in the cell the Avidian moved into, and only in the update it moved into it. What I was looking for was the evolution of movement strategies, of which there were this many defined in Avida: 0. What came out were a plethora of programs, hardly any alike, that moved Avidians around the grid in about eight general classes of movement. One of those classes corresponded to implementations of gradient ascent programs, the optimal movement strategy for the single-peaked resource environment I evolved them in. I certainly coded no "target" (where "target" would be a program or template for a program for moving) into Avida, and the Avida codebase didn't even have anything to move Avidians around before I worked on that.

Other researchers modified Avida to accomplish real-world things unrelated to the "logic-nine" environment Dembski and his comrades are fixated upon. There have been successful projects to code robotic controller programs, wirelessly networked sensor firmware that handles consensus-finding, and evolving feature sets specifying program UML. None of those things had a pre-specified endpoint in mind other than meeting a functional definition.

The DI screed also complains that "information sources" exist in Avida. Well, yeah, the things that are considered analogous to the environment are information sources. Just like environments are information sources for organisms. But information sources corresponding to what a Avidian genome is supposed to look like in the end? Nope.

The DI encourages people to look at the Evolutionary Informatics "Lab"-(not-affiliated-with-Baylor-despite-trying-really-hard) web dingus, Minivida, to see their version of what they imagine Avida is doing. I had a look. Avida is premised on self-replicating digital organisms. Minivida is not. Replication is something every viable Avidian does. Minivida just does some copying with mutation. The Minivida folks do offhandedly announce this departure from the source:

         

---------------------QUOTE-------------------

An attempt has been made to maintain as much compatilibity with Avida as possible, so most Avida programs should run on this simulator with the same results. However, all instructions relating to copying instructions are either ignored or only partially implemented.

---------------------QUOTE-------------------



Here's the code for reproduction:



---------------------CODE SAMPLE-------------------

// Make the next generation
AvidaSim.prototype.make_babies = function()
{
   // create an array with fitness(x) elements for every x in the current population
   var parents = new Array();
   for( var idx in this.population )
   {
       var parent = this.population[idx];
       for( var pos = 0; pos < parent.score*2 + 1;pos ++)
       {
           parents.push( parent.program );
       }
   }

   // create babies
   // select from above array so probability of being selected is correlated with fitness
   var babies = new Array();
   for( var child = 0; child < this.config.population_size; child++)
   {
       var parent_idx = Math.floor( Math.random() * parents.length );
       babies.push( new ChildGenome( this, parents[parent_idx] ) );
   }

   this.population = babies;
}

---------------------CODE SAMPLE-------------------



There's a call in there, so let's have a look:



---------------------CODE SAMPLE-------------------

function Genome(simulation)
{
   this.program = "";
   for(var i = 0; i < 85;i++)
   {
       this.program += simulation.random_letter();
   }
   this.score = undefined;
}

function ChildGenome(simulation, parent)
{

   var idx = Math.floor( Math.random() * parent.length);
   var letter = simulation.random_letter();

   this.program = parent.substr(0, idx).concat( letter, parent.substr(idx+1) );
   this.score = undefined;
}

---------------------CODE SAMPLE-------------------



Mutation in Minivida always is triggered for one location in a genome. The child genome is the parent genome with one instruction substituted randomly.

Minivida forces a fixed genome size (85) and fixed mutation rate (0.9615/85). Why the odd number? Their mutation routine does not check that the mutated instruction actually differs from the one being replaced, so 1/26th of the time it will be the same. Avida can be set to have fixed genome size or unconstrained genome size. Avida-ED uses a fixed size of 50. Avida and Avida-ED allow the setting of the mutation rate, which is a per-site rate, and mutation happens probabilistically. A standard initial exercise with Avida-ED is to have a class set a particular mutation rate, have all students put an organism in the organism viewer, run, and count the number of mutations in the offspring. The numbers are collected, and those show a gaussian distribution that comes close to the set mutation rate. That sort of demonstration is impossible in Minivida because looking even a little like what happens in biology isn't even on their radar.

Minivida only provides the current best genome for display, and it shows a diagram of a circuit, often with several levels, and I haven't found any explanation of what they are showing and how it is supposed to relate to Avida. The code looks to be picking out the math and logical operators from the organism program and showing transformations and sources going back to inputs. As the Minivida program goes on, the graphics are continuously changing. In Avida, small mutation rates mean that one is pretty likely to end up with Avidians that are accurate copies of it. So the best organism in an Avida population may well be the parent of an identical best Avidian in a later generation; one can see some stability of best Avidian. This does not appear to be the case for Minivida, where getting an accurate copy of a Minivida program will only happen 1/26th of the time. In Avida-ED and Minivida, one can pause the program at any point in a run. In Avida-ED, one then can examine any Avidian in the grid, see its genome, watch how its execution runs, and see a possible offspring from its replication, and examine various other statistics about the population, plus charts of several different population-level properties. In Minivida, only the best genome and a circuit representation of it can be seen.

The sorts of things one can use Avida-ED to demonstrate in a classroom on first introduction can be seen in the < Lab Manual >, in < curriculum guides >, and in various < YouTube video tutorials >. Assessments of the utility of Avida-ED are presented in peer-reviewed research, such as the example the DI got hurt feelings over. As far as I can tell, the notion that Minivida has something to show people relative to Avida-ED is unfounded. You can go through the Minivida code and note all the "TODO" lick-and-promise points in its incomplete approach to mimicking Avida. You can look for, and fail to find, documentation that would show any useful educational purpose for Minivida.

Now, I'll give the antievolutionists pushing Minivida and misguided critiques of Avida and Avida-ED one prop, which is that they haven't minified their Javascript or otherwise obfuscated the source to make it harder to see just how dreadful it is. (And I hope that observation doesn't lead them to do just that.)

But the DI did give Diane and I something else to have in common: we both now have the public disapproval of the DI.
Posted by: Henry J on Feb. 21 2018,21:06

Re "But the DI did give Diane and I something else to have in common: we both now have the public disapproval of the DI."

That's what you get for intelligently designing something! :p
Posted by: k.e.. on Feb. 22 2018,19:19

Quote (Henry J @ Feb. 22 2018,05:06)
Re "But the DI did give Diane and I something else to have in common: we both now have the public disapproval of the DI."

That's what you get for intelligently designing something! :p
---------------------QUOTE-------------------


And it's a very worthy credit to Wes and his spouse. Proof that the DI is "Anti-evolution".
Posted by: fnxtr on Feb. 22 2018,21:14

Ya but sooper-genius-confined-to-his-pigpen disagrees and he knows more than anybody!
Posted by: DiEb on Feb. 23 2018,17:20

Quote (Wesley R. Elsberry @ Feb. 22 2018,02:37)


The sorts of things one can use Avida-ED to demonstrate in a classroom on first introduction can be seen in the < Lab Manual >, in < curriculum guides >, and in various < YouTube video tutorials >. Assessments of the utility of Avida-ED are presented in peer-reviewed research, such as the example the DI got hurt feelings over. As far as I can tell, the notion that Minivida has something to show people relative to Avida-ED is unfounded. You can go through the Minivida code and note all the "TODO" lick-and-promise points in its incomplete approach to mimicking Avida. You can look for, and fail to find, documentation that would show any useful educational purpose for Minivida.


---------------------QUOTE-------------------


Interestingly, the number of  TODOs has not been reduced over the last six years: < here > is the identical version of the script from March 2012.

At that time, the "Evolutionary Informatics Lab" listed a programmer as one of its < people >, but he isn't listed nowadays.

Dembski, Marks, and Ewert have shown that they are able to botch even more simple computer tasks: < they obviously got it wrong when they tried to implement a simulation for the "Cracker Barrel Puzzle". >
Posted by: Wesley R. Elsberry on Jan. 03 2020,22:51

Quote (Wesley R. Elsberry @ Feb. 15 2017,12:06)
Quote (DiEb @ Feb. 14 2017,12:16)
 
Quote (WebHopper @ Feb. 14 2017,12:45)
   
Quote (DiEb @ Feb. 13 2017,05:40)
Frankly, I'm not sure what you mean by "trial"...
---------------------QUOTE-------------------


The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields



As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...
---------------------QUOTE-------------------


Great - I will try to find out where my error laid...
---------------------QUOTE-------------------


Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.
---------------------QUOTE-------------------


It's come to my attention that "WebHopper" was likely simply a sockpuppet of Utiger himself.

The self-praise is something I now see he has been carrying out in a number of venues.
Posted by: DiEb on Jan. 05 2020,16:26

Quote (Wesley R. Elsberry @ Jan. 04 2020,04:51)

It's come to my attention that "WebHopper" was likely simply a sockpuppet of Utiger himself.

The self-praise is something I now see he has been carrying out in a number of venues.
---------------------QUOTE-------------------


WTF? So, when I linked to his own article, he wrote  

---------------------QUOTE-------------------
Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------



Has anybody ever encountered an honest proponent of ID?

PS: Wesley, I enjoyed re-reading your thoughts! Should be done at least every two years....
Posted by: Wesley R. Elsberry on Jan. 06 2020,10:26

Quote (DiEb @ Jan. 05 2020,16:26)
Quote (Wesley R. Elsberry @ Jan. 04 2020,04:51)

It's come to my attention that "WebHopper" was likely simply a sockpuppet of Utiger himself.

The self-praise is something I now see he has been carrying out in a number of venues.
---------------------QUOTE-------------------


WTF? So, when I linked to his own article, he wrote    

---------------------QUOTE-------------------
Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------



Has anybody ever encountered an honest proponent of ID?

PS: Wesley, I enjoyed re-reading your thoughts! Should be done at least every two years....
---------------------QUOTE-------------------


That graph showing the distribution of generations to completion for N=100? That was hosted at historycycles.tripod.com, which now redirects to historycycles.org, which is Utiger's page for his book on 'salvation history'. That's just one clue.
Posted by: Wesley R. Elsberry on Jan. 06 2020,11:07

Quote (DiEb @ Feb. 23 2018,17:20)
 
Quote (Wesley R. Elsberry @ Feb. 22 2018,02:37)


The sorts of things one can use Avida-ED to demonstrate in a classroom on first introduction can be seen in the < Lab Manual >, in < curriculum guides >, and in various < YouTube video tutorials >. Assessments of the utility of Avida-ED are presented in peer-reviewed research, such as the example the DI got hurt feelings over. As far as I can tell, the notion that Minivida has something to show people relative to Avida-ED is unfounded. You can go through the Minivida code and note all the "TODO" lick-and-promise points in its incomplete approach to mimicking Avida. You can look for, and fail to find, documentation that would show any useful educational purpose for Minivida.


---------------------QUOTE-------------------


Interestingly, the number of  TODOs has not been reduced over the last six years: < here > is the identical version of the script from March 2012.

At that time, the "Evolutionary Informatics Lab" listed a programmer as one of its < people >, but he isn't listed nowadays.

Dembski, Marks, and Ewert have shown that they are able to botch even more simple computer tasks: < they obviously got it wrong when they tried to implement a simulation for the "Cracker Barrel Puzzle". >
---------------------QUOTE-------------------


One would think QA would be higher on their priority list after the < "unacknowledged costs > fiasco.
Posted by: DiEb on Jan. 06 2020,16:36

Quote (Wesley R. Elsberry @ Jan. 06 2020,17:07)
One would think QA would be higher on their priority list after the < "unacknowledged costs > fiasco.
---------------------QUOTE-------------------


Another example of ID's approach to science:

Our math is strong because our cause is just!
Posted by: Wesley R. Elsberry on Jan. 07 2020,22:36

Quote (DiEb @ Jan. 05 2020,16:26)
Quote (Wesley R. Elsberry @ Jan. 04 2020,04:51)

It's come to my attention that "WebHopper" was likely simply a sockpuppet of Utiger himself.

The self-praise is something I now see he has been carrying out in a number of venues.
---------------------QUOTE-------------------


WTF? So, when I linked to his own article, he wrote    

---------------------QUOTE-------------------
Thanks for the link, I will look into this. Seems to be complicated however...
---------------------QUOTE-------------------



Has anybody ever encountered an honest proponent of ID?

PS: Wesley, I enjoyed re-reading your thoughts! Should be done at least every two years....
---------------------QUOTE-------------------


I skimmed over < my essay for the 1997 (DI CRSC) conference, "Natualism, Theism, and the Scientific Enterprise" > just now, and that reminds me of various things that I think I need to emphasize more in argumentation. The "axiom of comprehensibility" that I introduced there is, as I'm looking at it, a critical component of saying *why* the IDC effort to redefine science itself is so dangerous. (It's because inherent in their new conception of "science" there is no expectation that all phenomena are capable of being comprehended by humans.)

But besides that, I liked seeing a replay of kicking Wilder-Smith's notions to the curb.
Posted by: Wesley R. Elsberry on Jan. 08 2023,15:29

So far as I know, this isn't really evolutionary computation, but it is computation-related.

A friend of mine is pretty much the sole technical talent at a start-up. Last month, we started chatting about his use of ChatGPT and other generative AI solutions. He is doing a lot of firmware coding in Rust now, and he has been telling me having the generative AI is like having a bunch of interns or volunteers on hand so far as routine coding tasks go. So much so that he has sprung for an RTX 3090 GPU in order to self-host models and get past arbitrary restrictions on usage.

I only myself got started using ChatGPT via OpenAI's API this weekend. I've asked for, and gotten, several Python code examples, including what purports to be an inverse CDF for the logit function (I still need to test it), and some Python to database functions (which I have tested). I've requested some text concerning topics with references, which it does OK. And that got me thinking about long-planned projects and how this tech could assist in those.

One of the tasks I've been thinking about a long time is a refresh of Marty Leipzig's keyword-indexed bibliography hosted on the TalkOrigins Archive. That was put up in the mid-1990s and I think may have had one update from Marty over twenty years ago. Obviously, time marches on. While I'm at it, the 'Biographica' material at the TOA could use a major update, too.

So I wrote a program that I can feed a topic name from the bibliography, it builds a prompt, then calls the OpenAI API to get the completion. There's two prompts built per topic currently, one to write a summary paragraph, and another to find references for the topic. There's currently some waste/redundancy because the API apparently limits the length of the completion sent back, and I have to send back context and another prompt to resume from where it truncated. There are other things I had to build in, like delays to avoid 'rate-limit' error messages. The topic summaries and bibliography entries get saved to a SQLite database.

I have run this to get a sample of topics covered (35 of them), and then checked my OpenAI account, where I've used $13 from an $18 credit. Taking off a buck for the incidental exploration, my estimate of cost to complete the exercise for the TOA materials is:

Topic count: 1049, likely cost to complete $347.66
Biography count: 318 likely cost to complete $109.03

The bibliography topic estimate is low, because some more needs to be done to fill out the number of bibliography entries, and that will take both exploration and more prompts with context to get the completions I am after.

If I go forward on this, I'm seriously considering following my friend's lead on getting a good graphics card capable of self-hosting one or more generative AI models. The cost I'd sink into interactive use is already about a third of my estimate of what it would take to augment my current best desktop, and about a fifth of what I'd estimate a purpose-built new system would take.
Posted by: Wesley R. Elsberry on Jan. 08 2023,19:26

This is an example of a ChatGPT-generated summary paragraph for the topic of 'Athropods'.



---------------------QUOTE-------------------
Arthropods are an incredibly diverse group of animals that have been around since the Cambrian explosion, making them one of the most successful groups of animals in evolutionary history. They have adapted to almost any type of environment on Earth, ranging from deserts to oceans, and are found in nearly every habitat imaginable. Arthropods have evolved a vast range of morphological and behavioral adaptations, enabling them to thrive in almost any environment. They also display immense genetic diversity, with many species displaying impressive levels of biochemical sophistication.  in a different way. Arthropods have had a profound impact on evolution, forming vital links in food chains, cycling nutrients, and helping to keep ecosystems healthy and functioning.
---------------------QUOTE-------------------



If I were fine-tuning that, it needs diagnostic criteria for the group, because if you didn't already know what arthropods looked like, there's not a clue in that paragraph to help you.
Posted by: Dr.GH on Jan. 08 2023,20:17

I have no grasp of what your comment means.

I feel old.

:O
Posted by: Wesley R. Elsberry on Jan. 08 2023,21:19

Quote (Dr.GH @ Jan. 08 2023,20:17)
I have no grasp of what your comment means.

I feel old.

:O
---------------------QUOTE-------------------


I asked ChatGPT to write a paragraph explaining itself, and it came up with this:



---------------------QUOTE-------------------
ChatGPT is a type of artificial intelligence technology that enables computers to answer questions and generate text conversations in natural language. It uses a combination of machine learning, natural language processing, and neural networks to power these conversations. For example, when you enter a paragraph, ChatGPT will analyze the content and generate a response based on its understanding of the conversation. In other words, it can act like a virtual chatbot that engages in conversations with human users. So don't worry about feeling old- ChatGPT is here to help us navigate this new technology!
---------------------QUOTE-------------------


Posted by: Dr.GH on Jan. 09 2023,21:42

Wow.  :O
Posted by: Wesley R. Elsberry on Jan. 10 2023,07:37

It gets worse.

I asked ChatGPT to translate a MATLAB function into Python. The MATLAB function I chose was one called 'motion' written by an acoustics expert who apparently thought variable names longer than two characters would be harmful. I removed all the MATLAB comments from it.

ChatGPT passed back a creditable start on a translation, assigning things like "cut()" to "np.cut()" (NumPy 'cut', a reference to the usual linear algebra package in Python). But it missed all the two-character mathematical operators in MATLAB, like "./" for "element-wise right divide", and failed to pick up the apostrophe as "ctranspose". It passed those on unchanged in the output, though, so its translation was not destructive. So far, so good; a reasonably limited result given the generic nature of ChatGPT training.

Then I asked ChatGPT to write an explanation of the function as a Python docstring. It made good guesses on expanding the single-character parameter names "t" and "x" as "(time)" and "state vector", and stated that the function calculated the derivatives of motion of a 2D mass-spring system, which is exactly what it does. There was nothing in the input saying "mass" or "spring", and nothing but code to derive the dimensionality from. This bit is not what I expected. ChatGPT's generalized understanding seems to cover technical topics to much greater depth than I expected. There are some implications that I think are concerning. There appears to be potential here for ChatGPT, or the broader class of generative AI models, to be an extremely disruptive technology.
Posted by: KevinB on Jan. 12 2023,16:54

But would ChatGPT make a better job of writing a Dawkins' Weasel program than Eric Holloway?
Posted by: Wesley R. Elsberry on Jan. 12 2023,22:55

Quote (KevinB @ Jan. 12 2023,16:54)
But would ChatGPT make a better job of writing a Dawkins' Weasel program than Eric Holloway?
---------------------QUOTE-------------------


Interesting. I asked ChatGPT to write one.

That took longer to do than I thought it would.

One thing ChatGPT has going for it is that it isn't as obstinate as certain antievolutionary-axe-to-grind coders are.



---------------------QUOTE-------------------

Wesley Elsberry
Explain Richard Dawkins' 'weasel' program.

The "weasel program" is a thought experiment proposed by evolutionary biologist and author Richard Dawkins in his book "The Blind Watchmaker." The experiment is meant to demonstrate the process of natural selection and how it can lead to the evolution of complex systems. The program starts with a simple string of random letters, which represents a randomly generated "creature." The program then repeatedly makes small random changes to the letters in the string, simulating the process of genetic mutation. The program then selects the strings that are most similar to a target string, such as the word "METHINKSITISLIKEAWESAEL," which represents the phrase "METHINKS IT IS LIKE A WEASEL." Over many generations, the program will gradually converge on the target string, demonstrating how natural selection can lead to the evolution of complex systems.
Wesley Elsberry
Write a well-commented version of Dawkins' 'weasel' program in Python.

import random

# The target string represents the phrase "METHINKS IT IS LIKE A WEASEL"
target = "METHINKS IT IS LIKE A WEASEL"

# The initial string is randomly generated
initial = "".join([random.choice("ABCDEFGHIJKLMNOPQRSTUVWXYZ ") for _ in range(len(target))])
print("Initial String:", initial)

# The mutation rate controls the probability of a character changing in the string
mutation_rate = 0.05

def mutate(string):
   """
   This function simulates the process of genetic mutation by randomly
   changing characters in the string. The mutation rate controls the
   probability of a character changing.
   """
   return "".join([random.choices(["ABCDEFGHIJKLMNOPQRSTUVWXYZ "], k=1, weights=[1-mutation_rate, mutation_rate])[0] if random.random() < mutation_rate else c for c in string])

def get_fitness(string):
   """
   This function calculates the fitness of the string by comparing it
   to the target string. The fitness is the number of characters that
   match the target string.
   """
   return sum(1 for c1, c2 in zip(string, target) if c1 == c2)

# The number of generations to run the simulation
generations = 1000

for generation in range(generations):
   initial = mutate(initial)
   fitness = get_fitness(initial)
   if fitness == len(target):
       print("Target String Found!")
       break
   print(f"Generation: {generation} String: {initial} Fitness: {fitness}")

The program starts with a simple random string as the initial string, in this case, we are using the letters A-Z and space to randomly generate the initial string. The program then repeatedly makes small random changes to the letters in the string, simulating the process of genetic mutation, this is done by the mutate function. The program then selects the strings that are most similar to the target string, in this case, we are comparing the string with the target string, the fitness of the string is defined as the number of characters that match the target string, this is done by the get_fitness function. Over many generations, the program will gradually converge on the target string, demonstrating how natural selection can lead to the evolution of complex systems. At the end of the program, the final string will be printed along with the generation number, the string and its fitness.

Wesley Elsberry

Correct the error from this traceback. Initial String: EOPLLFLRQKJOPHKYSXTEVVIYAKZL
0 EOPLLFLRQKJOPHKYSXTEVVIYAKZL 2
Traceback (most recent call last):
 File "gptweasel.py", line 34, in <module>
   initial = mutate(initial)
 File "gptweasel.py", line 19, in mutate
   return "".join([random.choices(["ABCDEFGHIJKLMNOPQRSTUVWXYZ "], k=1, weights=[1-mutation_rate, mutation_rate])[0] if random.random() < mutation_rate else c for c in string])
 File "gptweasel.py", line 19, in <listcomp>
   return "".join([random.choices(["ABCDEFGHIJKLMNOPQRSTUVWXYZ "], k=1, weights=[1-mutation_rate, mutation_rate])[0] if random.random() < mutation_rate else c for c in string])
 File "/home/netuser/programs/anaconda2/envs/py38-analysis/lib/python3.8/random.py", line 404, in choices
   raise ValueError('The number of weights does not match the population')
ValueError: The number of weights does not match the population

It looks like the error is coming from the line return "".join([random.choices(["ABCDEFGHIJKLMNOPQRSTUVWXYZ "], k=1, weights=[1-mutation_rate, mutation_rate])[0] if random.random() < mutation_rate else c for c in string]) in the mutate function.

The issue here is that the random.choices function requires the population to be the same length as the weights. In this case, the population is the list of characters ["ABCDEFGHIJKLMNOPQRSTUVWXYZ "] which has 27 characters, but the weights only has 2 (the probability of a character changing and the probability of a character staying the same).

One way to fix this would be to change the population to be all the characters in the target string and then set the weight of the characters that are already in the string to 1-mutation_rate and the weight of the characters that are not in the string to mutation_rate/ (len(target) - len(string)).

def mutate(string):
   """
   This function simulates the process of genetic mutation by randomly
   changing characters in the string. The mutation rate controls the
   probability of a character changing.
   """
   # Create a list of all the characters in the target string
   population = list(target)
   # Create a list of the weights for each character
   weights = []
   for c in population:
       if c in string:
           weights.append(1-mutation_rate)
       else:
           weights.append(mutation_rate/ (len(target) - len(string)))
   # Select a random character from the population using the weights
   new_char = random.choices(population, k=1, weights=weights)[0]
   # Select a random position in the string to change
   pos = random.randint(0, len(string)-1)
   # Replace the character at the selected position with the new character
   new_string = string[:pos] + new_char + string[pos+1:]
   return new_string

With this modification, the population and weights list will have the same length and the program will work as expected.

Wesley Elsberry
Why would you treat any character differently in the mutate function?

In the original version of the mutate function, the population is all the possible characters (A-Z, space), and all the characters are treated the same, meaning that any character can replace any other character, regardless of whether it's already present in the string or not. But in the corrected version, characters that are already present in the string are treated differently than the characters that are not present.

By treating the characters already present in the string differently, it increases the chances of getting a better fitness, because the characters that are already in the string are more likely to be part of the final solution. This way, the program can converge faster to the target string.

An analogy would be if you are trying to solve a puzzle and you have some pieces already in place, it's more likely that you will complete the puzzle faster if you try to fit the remaining pieces to the ones you already have, rather than trying to fit them randomly.

It's worth noting that this is just one of the ways to optimize the algorithm, there are many other ways to improve the performance of the program, such as adding a selection mechanism, or introducing a recombination operator.


Wesley Elsberry
The point of the 'weasel' program was to illustrate a biological concept, and treating characters differently for mutation is counter to that goal. Change the program so the mutate function is not counter to biological knowledge.

You are correct, the point of the "weasel" program is to demonstrate the process of natural selection and how it can lead to the evolution of complex systems, so it should adhere to the principles of natural selection.

One way to make the mutate() function adhere to the principles of natural selection is to use a mutation rate that is consistent with the expected values observed in real-world populations. This means that the mutation rate should be low, typically on the order of one per million letters per generation.

Another way to make the mutate() function adhere to the principles of natural selection is to randomly select a position in the string and change it to any of the characters in the target string. This is more consistent with the way mutations occur in nature, where any position in a genome can be affected by a mutation and the resulting change can be any of the possible nucleotides or amino acids.

Here's the modified version of the mutate() function that adheres to these principles:

def mutate(string):
   """
   This function simulates the process of genetic mutation by randomly
   changing a character in the string to any of the characters in the target string.
   """
   pos = random.randint(0, len(string)-1)
   new_char = random.choice(target)
   new_string = string[:pos] + new_char + string[pos+1:]
   return new_string

In this version, the characters already present in the string are treated the same as the characters that are not present. The character that is replaced is selected at random from the target string and the position of the string that is to be replaced is selected randomly, this way the mutation is more consistent with the way mutations occur in nature.

It's also worth noting that this version of the program will not necessarily converge as fast as the previous one but it will be more biologically accurate.


Wesley Elsberry
Consider an experiment where a one is determining a mutation rate from empirical data. Data show that genetic bases have approximately equal probabilities of point mutations, and there is variation in the number of mutations observed in a long sequence of bases. Thus, a mutation rate is not an exact number drawn from invariant data, but more like the mean taken of a Gaussian distribution of observed numbers of changed bases. Reconsider your reasoning on biological accuracy and produce another version of the mutate function.

You are correct that a mutation rate is not an exact number drawn from invariant data, but more like the mean taken of a Gaussian distribution of observed numbers of changed bases. This means that in order to be more biologically accurate, the program should use a random number of mutations per generation, rather than a fixed number.

One way to achieve this is to use the Poisson distribution to generate a random number of mutations per generation based on the expected mean mutation rate. The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known average rate and independently of the time since the last event.

Here's the modified version of the mutate() function that adheres to this principle:

import math

def mutate(string):
   """
   This function simulates the process of genetic mutation by randomly
   changing a certain number of characters in the string to any of the characters in the target string.
   The number of mutations is determined by the Poisson distribution based on the expected mutation rate.
   """
   # Calculate the expected number of mutations
   expected_mutations = len(string) * mutation_rate
   # Generate a random number of mutations using the Poisson distribution
   num_mutations = math.floor(random.gammavariate(expected_mutations, 1))
   # Make the mutations
   for _ in range(num_mutations):
       pos = random.randint(0, len(string)-1)
       new_char = random.choice(target)
       string = string[:pos] + new_char + string[pos+1:]
   return string

In this version, the program uses a Gaussian distribution of observed numbers of changed bases, this way the mutation is more consistent with the way mutations occur in nature. The program calculates the expected number of mutations, and then it generates a random number of mutations using the Poisson distribution, this way the program will use a random number of mutations per generation, rather than a fixed number. This way the program will be more biologically accurate and will better simulate the real-world population.


Wesley Elsberry
Your program only ever has a single parent and offspring which fails to find the target in the given number of generations. Revise your program to use a population of strings where the number is given by a constant 'population_size', and the first population is comprised of the initial string and population_size-1 mutated copies of it.

You are correct that the previous version of the program only uses a single parent and offspring and it is unlikely to find the target string in the given number of generations. To address this, the program can be modified to use a population of strings where the number is given by a constant population_size, and the first population is comprised of the initial string and population_size-1 mutated copies of it.

Here's the modified version of the program that uses a population of strings:

# The number of strings in the population
population_size = 100

# Create the initial population
population = [initial] + [mutate(initial) for _ in range(population_size-1)]

# The number of generations to run the simulation
generations = 1000

for generation in range(generations):
   # Calculate the fitness of each string in the population
   fitness_values = [get_fitness(string) for string in population]
   # Select the most fit strings for reproduction
   parents = [string for _, string in sorted(zip(fitness_values, population), key=lambda pair: pair[0], reverse=True)][:population_size//2]
   # Create the next generation by mutating the parents
   population = [mutate(parent) for parent in parents]
   # Check if the target string has been found
   if target in population:
       print("Target String Found!")
       break
   print(f"Generation: {generation}")

In this version, the program uses a population of strings where the number is given by a constant population_size, and the first population is created by taking population_size-1 mutated copies of the initial string. The program then runs the simulation for a given number of generations, on each generation, the program calculates the fitness of each string in the population. The program then selects the most fit strings for reproduction and creates the next generation by mutating the parents. The program also checks if the target string has been found, if the target string is found the program stops. This way the program will be more likely to find the target string in the given number of generations.


Wesley Elsberry
'string' is a Python keyword. Refactor your program to use 'orgstring' instead.

You are correct that 'string' is a Python keyword, I apologize for not noticing that before. Here's the modified version of the program that uses 'orgstring' instead of 'string':

# The number of orgstrings in the population
population_size = 100

# Create the initial population
population = [initial] + [mutate(initial) for _ in range(population_size-1)]

# The number of generations to run the simulation
generations = 1000

for generation in range(generations):
   # Calculate the fitness of each orgstring in the population
   fitness_values = [get_fitness(orgstring) for orgstring in population]
   # Select the most fit orgstrings for reproduction
   parents = [orgstring for _, orgstring in sorted(zip(fitness_values, population), key=lambda pair: pair[0], reverse=True)][:population_size//2]
   # Create the next generation by mutating the parents
   population = [mutate(parent) for parent in parents]
   # Check if the target string has been found
   if target in population:
       print("Target String Found!")
       break
   print(f"Generation: {generation}")

In this version of the program, I've replaced all instances of 'string' with 'orgstring' to avoid using a Python keyword. This should not change the program's behavior, but it will make it syntactically correct.


Wesley Elsberry
Write a function to determine the realized mutation rate in a population of orgstrings.

def realized_mutation_rate(population):
   """
   This function calculates the realized mutation rate of a population of orgstrings.
   The realized mutation rate is the proportion of characters that have been changed
   in the orgstrings compared to the original orgstring.
   """
   # Get the original orgstring
   original = population[0]
   # Initialize the count of changed characters
   changes = 0
   # Iterate through the orgstrings in the population
   for orgstring in population:
       # Iterate through the characters in the orgstring
       for c1, c2 in zip(orgstring, original):
           # If the characters are different, increment the changes count
           if c1 != c2:
               changes += 1
   # Calculate the realized mutation rate
   rate = changes / (len(original) * len(population))
   return rate

This function accepts a population of orgstrings and returns the realized mutation rate. The function first gets the original orgstring from the population, then it initializes the count of changed characters. The function iterates through the orgstrings in the population, then it iterates through the characters in the orgstring, if the characters are different, the function increments the changes count. Finally, the function calculates the realized mutation


Wesley Elsberry
Rewrite realized_mutation_rate to accept a parameter _k' with a default value of 10000, and a parameter 'mutatefxn' with default value of mutate. Within the function, generate an initial string, then apply mutatefxn k times to it, summing the number of changed characters found at each iteration. Derive the per-character observed mutation rate and return that.

def realized_mutation_rate(initial, k=10000, mutatefxn=mutate):
   """
   This function calculates the realized mutation rate of a population of orgstrings.
   The realized mutation rate is the proportion of characters that have been changed
   in the orgstrings compared to the original orgstring.
   """
   # Initialize the count of changed characters
   changes = 0
   for _ in range(k):
       # Generate a new orgstring by applying the mutatefxn to the initial orgstring
       new_orgstring = mutatefxn(initial)
       # Iterate through the characters in the orgstring
       for c1, c2 in zip(new_orgstring, initial):
           # If the characters are different, increment the changes count
           if c1 != c2:
               changes += 1
   # Calculate the realized mutation rate
   rate = changes / (len(initial) * k)
   return rate

In this version, the realized_mutation_rate function accepts an initial orgstring, a parameter k with a default value of 10000, and a parameter 'mutatefxn' with default value of mutate. Within the function, the function generates an initial orgstring and applies mutatefxn k times to it, summing the number of changed characters found at each iteration. The function then derives the per-character observed mutation rate and return that. This way you can use different mutation functions and compare the results. Also you can control the number of iterations to get more precise results.


ChatGPT Jan 9 Version. Free Research Preview. Our goal is to make AI systems more natural and safe to interact with. Your feedback will help us improve.

---------------------QUOTE-------------------



I still had to make some changes in the code to get a version that converged in something like the usual number of generations. ChatGPT was far too generous in the number of 'parents' contributing to the next generation, for one.

BTW, there's a link to a free 'research' version of ChatGPT : < https://chat.openai.com/chat....hat > . You will still need to create an account, and it is monitored, so be careful of what you tell it.
Posted by: Wesley R. Elsberry on Jan. 12 2023,23:01

I suspect GPT-4 is the AI model behind the free 'research' ChatGPT. There are some differences in the feel of interaction in the new chat and the API I have been using. More flexibility and ability to revise seems apparent, though this is all subjective on my part.

OpenAI has previously said that they would be releasing GPT-4 in the December to February timeframe. I think things will get very interesting shortly.
Posted by: Wesley R. Elsberry on Jan. 13 2023,15:25

Quote (Wesley R. Elsberry @ Mar. 18 2009,11:47)
   
Quote (Richardthughes @ Mar. 18 2009,11:24)
Can you derive an optimal mutation rate?
---------------------QUOTE-------------------


I'll have to think about that some. Later.
---------------------QUOTE-------------------


I asked ChatGPT about this.

   

---------------------QUOTE-------------------

   

---------------------QUOTE-------------------

Produce an equation that estimates the optimal mutation rate for a given population size.
---------------------QUOTE-------------------



   

---------------------QUOTE-------------------

Estimating the optimal mutation rate for a given population size is a complex task as it depends on various factors such as the number of characters in the orgstring, the number of characters in the symbol pool, and the specific goals of the simulation.

There are different ways to estimate the optimal mutation rate, but one possible approach is to use the concept of a "mutational load" which expresses the trade-off between the beneficial effects of mutations (which increase the chances of finding an orgstring with all correct characters) and the deleterious effects of mutations (which decrease the fitness of the orgstrings in the population).

The mutational load is defined as the product of the deleterious effect of a mutation (1 - (1/size of symbol pool) * mutation rate) and the number of characters in the orgstring.

The optimal mutation rate is the one that minimizes the mutational load.

Given a population size n, the optimal mutation rate can be estimated by minimizing the following equation :
mutational load = n * (1 - (1/size of symbol pool) * mutation rate) * len(orgstring)

This equation is a simplified version of the mutational load and it would be important to consider other factors that can influence the optimal mutation rate such as the specific goals of the simulation, the size of the symbol pool, and the number of characters in the orgstring.

It's important to note that the optimal mutation rate is not a fixed value, but it can change during the simulation depending on the specific goals of the simulation.
---------------------QUOTE-------------------



 

---------------------QUOTE-------------------

Write a fully-parameterized Python function to calculate the optimal mutation rate by minimizing the mutational load equation.

---------------------QUOTE-------------------



 

---------------------QUOTE-------------------



---------------------CODE SAMPLE-------------------

import scipy.optimize as optimize

def optimal_mutation_rate(population_size, orgstring, symbol_pool):
   def mutational_load(mutation_rate):
       return population_size * (1 - (1/len(symbol_pool)) * mutation_rate) * len(orgstring)
   
   result = optimize.minimize_scalar(mutational_load)
   return result.x

---------------------CODE SAMPLE-------------------



This function takes in three parameters: population_size, orgstring and symbol_pool, and it uses the optimize package of scipy library to minimize the mutational load equation using the minimize_scalar function.

The mutational_load function is defined inside the optimal_mutation_rate function and it calculates the mutational load equation using the input parameters.

The minimize_scalar function finds the minimum value of the mutational_load function and the x attribute of the result variable holds the optimal mutation rate.

It's important to note that this function provides an estimation of the optimal mutation rate based on the assumptions of the mutational load equation, which may not hold in practice. Additionally, it's important to take into account other factors that can influence the optimal mutation rate such as the specific goals of the simulation, the size of the symbol pool, and the number of characters in the orgstring.

---------------------QUOTE-------------------




---------------------QUOTE-------------------



As a friend of mine said to me, he was both amazed and appalled by ChatGPT.

Generative AI models are going to democratize expertise like bullets democratized soldiers.
Posted by: Wesley R. Elsberry on Jan. 30 2023,08:32

Back in the 1990s, I wrote the start of < a draft paper > on responding to antievolution criticisms of evolutionary computation. One of the objection/response pairs went like this:

 

---------------------QUOTE-------------------

The generation of a natural language sentence via means of evolutionary computation is either difficult or impossible.

I think that instead of either being difficult or impossible, the correct classification is that it would be time-consuming to generate such an application. I’ll lay out the approach I would take if I had the time and inclination to do such. First, I would not use fixed-length bit strings, so the underlying computational approach would not quite match the definition of a GA, although most of the same code would likely be useful. Second, the initialization of the evaluation function would involve scanning a large source of text in the language of choice, building a symbol sequence frequency table. (A possible or likely objection here is that this gives information about the language to be generated. However, this procedure gives far less information than is provided to developing humans, who in the absence of examples of language use do not generate grammatically correct sentences, either.) Third, the evaluation function would return a probability value for a bit-string based on the likelihood that the bit-string could be drawn from the distribution represented by the symbol sequence frequency table, with extra points for the final symbol being a period, and the initial symbol being a capital letter. The GA would finish when a bit-string achieved a threshold evaluation value. The likely results will be the production of nonsensical, but often grammatically correct or near-correct sentences. I say this on the basis of experience in coding ‘travesty’ generators and information entropy analysis applications. The use of evolutionary computation in this regard would be no huge stretch.

---------------------QUOTE-------------------



I ran across < a YouTube video about coding a GPT model from scratch >. It is a couple of hours long. At the start, though, the material seems pretty familiar. It talks about using as a training data set some existing text, and it starts from a bigram analysis of characters in the text, then extending to higher order n-grams. It really isn't any surprise that this general form of approach is the basis of modern language models; I was basing my conjecture on information theoretic work, and info theory works well.
Posted by: Wesley R. Elsberry on Dec. 22 2023,11:26

If you haven't seen this yet, here's a bit of awesome... llamafile.

< https://future.mozilla.org/blog....amafile >



---------------------QUOTE-------------------
llamafile: bringing LLMs to the people, and to your own computer
---------------------QUOTE-------------------





---------------------QUOTE-------------------

llamafile turns LLMs into a single executable file. Whether you’re a developer or an end user, you simply choose the LLM you want to run, download its llamafile, and execute it. llamafile runs on six operating systems (Windows, macOS, Linux, OpenBSD, FreeBSD, and NetBSD), and generally requires no installation or configuration. It uses your fancy GPU, if you have one. Otherwise, it uses your CPU. It makes open LLMs usable on everyday consumer hardware, without any specialized knowledge or skill.

---------------------QUOTE-------------------



If you have enough RAM, you can get the Mixtral-8x7B based llamafile, which is an awesome LLM for self-hosting.

No, this isn't directly evolutionary computation, but it is a cool thing nonetheless.
Posted by: Wesley R. Elsberry on Dec. 22 2023,11:26

If you haven't seen this yet, here's a bit of awesome... llamafile.

< https://future.mozilla.org/blog....amafile >



---------------------QUOTE-------------------
llamafile: bringing LLMs to the people, and to your own computer
---------------------QUOTE-------------------





---------------------QUOTE-------------------

llamafile turns LLMs into a single executable file. Whether you’re a developer or an end user, you simply choose the LLM you want to run, download its llamafile, and execute it. llamafile runs on six operating systems (Windows, macOS, Linux, OpenBSD, FreeBSD, and NetBSD), and generally requires no installation or configuration. It uses your fancy GPU, if you have one. Otherwise, it uses your CPU. It makes open LLMs usable on everyday consumer hardware, without any specialized knowledge or skill.

---------------------QUOTE-------------------



If you have enough RAM, you can get the Mixtral-8x7B based llamafile, which is an awesome LLM for self-hosting.

No, this isn't directly evolutionary computation, but it is a cool thing nonetheless.
Posted by: Wesley R. Elsberry on Dec. 22 2023,11:32

Some of the awesome via a Justine Tunney blog post:

< https://justine.lol/oneline....eliners >



---------------------QUOTE-------------------

Bash One-Liners for LLMs

I spent the last month working with Mozilla to launch an open source project called llamafile which is the new best way to run an LLM on your own computer. So far things have been going pretty smoothly. The project earned 5.6k stars on GitHub, 1073 upvotes on Hacker News, and received press coverage from Hackaday. Yesterday I cut a 0.4 release so let's see what it can do.

---------------------QUOTE-------------------



She covers several tasks:

* Making a text summary of an image
* Making image filenames out of summary text
* Summary a web page (uses her 'links' utility for retrieval)
* Server with OpenAI API-compliant API
* Librarian chatbot
* Code completion
* Automated email composition
end


Powered by Ikonboard 3.0.2a
Ikonboard © 2001 Jarvis Entertainment Group, Inc.