Antievolution.org - Antievolution.org Discussion Board -Topic::Could ID be "science"? (From PT)

WayneFrancis

Posts: 4
Joined: Nov. 2005

Posted: Nov. 17 2005,04:09

GCAGGTTAACAAGGAGTTTGCTAGAT

is one way to code PI to 16 digits. For this example 16 digits or 1600 digits it really doesn't matter
Say you find that sequence in organism X
Organism Y is thought to be closely related to organism X
Looking at Organism Y we find the sequence

GCTGGTTAACAAGGAGTTTGCTAGAT

What does this mean? The above sequence no longer = PI to any digits. How does this genetic sequence differ from the one above that equals PI in practical terms?
It doesn't! In organism Y we don't find PI. We find a sequence that if we have a point mutation it becomes PI. But both sequences will function exactly the same because they code the same amino acids.
So there you have a mutation that could occur from Organism Y to Organism X and the end result is that it produces on of many sequences that can be interpreted as PI.

But I've said it once and I'll say it again. Unless you have the Intelligent Designers standards for encoding floats into nucleotides then it doesn't matter. You can form an IEEE standard before going to look for PI to 1 million digits in any organism but it doesn't matter. What is the odds that you'll pick the same standards for encoding a float in a digital format in the same manner that an Intelligent Designer decided to?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,06:11

I haven't read the other ABC comments, but I want to add something I think is pertinent from my late replies on the Panda's Thumb thread.

First, in response to the subject: No, ID is claptrap. However, the mathematically rigorous field of decryption could be viewed as a set of methods for discovering human agency in data, and this strikes me as a better framework in which to cast the "finding pi" thought experiment.

The reason I favor this view is that it gives a simple counterexample to the notion that "finding pi" is always dismissable as cherry picking. This is easier to follow than my previous points about restricting the family of encodings or detecting string compressibility, so I present an improved thought experiment. It's better than the original because it doesn't depend on ridiculous assumptions about finding things in DNA that you will almost certainly never find there.

Suppose you're employed in a counter-terrorism agency to analyze communications for suspect content. A message comes to you to analyze. You know nothing about the senders, but roughly, it is a short message "Attached is the bacterial gene sequence you requested. If you need any more help, feel free to contact our lab any time. Sincerely, [etc.]" And there is an attached file that looks like a DNA sequence.

Assume we've already ruled out that this is a bacteria of interest to putative terrorists. You're basically a math geek who knows no biology and are employed to find encrypted messages.

One thing to note here is that it's not always feasible to decrypt a single message in isolation. Decryption experts are most successful when they have a long series of communications to analyze. But in this case, you have some time to spare and you get lucky.

After running the usual kinds of frequency analyses and finding nothing that suggests the string is not random, you're about ready to put it aside and go to the next problem.

But, being a math geek, you decide to see what happens if you try each of the 24 different ways of interpreting the bases A,C,T,G as bit pairs 00,01,10,11 and convert the string into bits according to that code. For each of these 24 strings, you take a prefix of the binary digit expansion of pi, XOR it against the string at every starting position, and interpret the resulting string as ASCII encoded text. Note: the likelihood that you did all of this is pretty low, but it is a plausible hypothetical scenario.

On a modern computer, all the tests should run in a matter of seconds. You do a frequency analysis on the resulting strings, and one of them shows a substring that looks statistically like English plaintext written in ASCII. You zoom in for a closer look and it is a paragraph of about 1k of grammatical English text that appears as a result of decoding the gene sequence as A=10, C=11, G=01, T=00 and XORing the resulting string with a prefix of the bit expansion of pi starting at the beginning of the text. Thus, the text has been XORed with about 8k bits of pi starting from the beginning.

Question 1: At this point do you dismiss your find as cherry picking or do you consider it more likely that the English text is an encrypted message put there by human agency and include this in a report as a significant finding?

Question 2: If your answer to the above is the former (dismiss as cherry picking) than do you believe that this "XOR with pi" cipher is a strong encryption method suitable for sending encrypted messages that will not be cracked by suitably motivated decryption experts?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,07:00

Let me add one comment to the previous post to counter objections that I have not "found pi" in the string but only hypothesized that it was used as an encryption pad.

Consider the same scenario, but instead of finding 1k of English text after the transformation, you find a 1k block of zero byte values. In that case, it would literally be the case that the string contained 8k bits of the prefix of pi encoded as one of the 24 substitution codes from nucleotide bases to base-4 digits.

That would be less interesting for counter-terrorism purposes than finding what looked like a message in human natural language, but I can ask the same question. Did I "find pi" because I engaged in cherry picking, or can I reasonably conclude that the inclusion of pi in the file is a significant and refutes the null hypothesis that the string is merely a sequence taken from ordinary wild type bacterial DNA.

And if you still explain it as cherry picking, do you think that the presence of pi in the sequence would escape the notice of decryption efforts given sufficient effort to find a pattern in the string.

To go off on a little bit of a tangent, suppose it wasn't just an email file, but something that had in fact resulted from sequencing actual DNA. Wouldn't there be some grounds to suspect that human agency was involved in inserting this sequence into the bacteria itself?

Reasonable alternatives are that it is either random chance, contributes to fitness, or can be explained in terms of the DNA copying process. Given the difficulties in any of these explanations, and given that a motivated human actually could insert encrypted messages in bacterial DNA, this one still strikes me as the best starting hypothesis.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,07:05

"given that a motivated human actually could insert encrypted messages in bacterial DNA"

And I would add: given that there are some common sense reasons to speculate that a human would be motivated to do so.

None of this, by the way, would establish the truth of the hypothesis. It would just establish the plausibility of the hypothesis of human agency.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:08

If I were a counter-terrorism agent, I would consider it a lead worthy of follow-up. �Likewise, as a scientist I would consider it an observation worthy of follow-up.

Either way, I would not consider it meaningful by itself.

Question 1 - I submit it as cherry picking.

Question 2 - I don't know enough about cryptology to answer. �But it sounds "neat".

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:25

OK. Well, the correct answers are:

(1) the data string failed a rigorously defined test of randomness that can be formalized in terms of Kolgmorov complexity. It is therefore a significant finding.

(2) XORing with a prefix of pi is a very weak encryption method. Don't try it if you really want to keep your data private.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,09:59

Quote

After running the usual kinds of frequency analyses and finding nothing that suggests the string is not random, you're about ready to put it aside and go to the next problem.

I'm confused. �I guess my answer to Question 2 is more accurate than I thought. �Are you saying that finding pi is paltry, significant, or sufficient evidence for Design with the big "D"?

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,10:45

Quote

I'm confused. I guess my answer to Question 2 is more accurate than I thought.

I can understand that it's a little confusing because people usually don't approach biology this way--and with good reason, since it is unlikely to be a fruitful approach. It is a fruitful approach to finding secret messages in places where you have some reason to suspect secret messages, and it is a rigorous methodology.

First off, my reference to "the usual kinds of frequency analyses" is to make the point that XORing with pi is not, as far as I know, part of any conventional randomness tests. Actually the digits of pi are sometimes used as a pseudorandom source (i.e. not actually random but mimicking many statistical properties of random digits).

I'm not a counter-terrorism agent and have only an amateur's interest in cryptology, so I cannot say if it would routinely be applied in decryption attempts. It seems a little too exotic to me, but what do I know?

Unfortunately, there are no foolproof tests of randomness. You can refute the hypothesis of randomness by showing that a string is compressible (a sufficiently long prefix of the expansion of pi is compressible into a much shorter computer program). However, there is no test that proves for certain that the string is incompressible.

However, a long string of the letters A,C,G,T each encoding a digit 0-4 in a uniform way so that the string was a prefix of the base-4 expansion of pi would indeed be a highly compressible string. So it would fail the test of randomness based on Kolmogorov complexity and would require some other explanation.

Quote

Are you saying that finding pi is paltry, significant, or sufficient evidence for Design with the big "D"?

I'm not sure what you mean by big D. (Seriously. Is a human designer a small or a big D?)

I'm saying that "finding pi" in the precise sense I have defined it (taking great care to rule out cherry picking the encoding, but without actually assuming a particular encoding) is significant in the sense that it refutes the hypothesis that the string is random.

If the string were found in bacteria, its non-randomness would be sufficient evidence of some mechanism not covered by our current understanding of the chemistry of DNA copying (we can get repeats, but we have not seen DNA perfoming numerical calculations that would produce pi) or by natural selection (how could secret encodings of pi improve the fitness?).

As to the particular mechanism, my first guess would be human agency. That would be a guess and not a conclusion unless I could demonstrate this some other way. The longshot runner up would be that somehow DNA can act as a numerical calculator to generate pi. If that could be refuted, then I think I'd just leave the observation open as unexplained until someone thinks of another testable hypothesis. However, the low Kolmogorov complexity would suffice to rule out chance as a reasonable explanation.

The reason ID is claptrap is that it doesn't stop by saying "we can't explain it yet."

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 17 2005,10:51

PaulC - coupla questions from someone who would like to have been a math geek but just wasn't smart enough:

I would have guessed that it was a significant find, as the odds against finding such a thing by chance with the smallish number of algorithms I could imagine trying (probably less than a million!

strike me as astronomical. But I don't know how to formally apply the test you mention. Can you go through it step by step, or give a reference?

My second question is what exactly does "prefix of pi" mean? Is that just the first (however many) digits?

--------------
Must... not... scratch... mosquito bite.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,11:53

By Design with a big "D" I am redundantly implying Intelligent Design by emphasizing that I capitalized the word 'design'. �Yes, saying it once is better than implying it twice.

Quote

Unfortunately, there are no foolproof tests of randomness. You can refute the hypothesis of randomness by showing that a string is compressible (a sufficiently long prefix of the expansion of pi is compressible into a much shorter computer program). However, there is no test that proves for certain that the string is incompressible.

Quote

I'm saying that "finding pi" in the precise sense I have defined it (taking great care to rule out cherry picking the encoding, but without actually assuming a particular encoding) is significant in the sense that it refutes the hypothesis that the string is random.

Compressable strings never happen by chance? �Other than pi and phi, what other numbers never occur by chance?

Again, numerology is fun, but should only be used for entertainment. �I'm off to Google Kolmogorov.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,12:47

First off, you can find a lot about Kolmogorov complexity on the web. Here's a wiki page to start: http://en.wikipedia.org/wiki/Kolmogorov_complexity
It's completely rigorous and has nothing in common with numerology.

Quote

Compressable strings never happen by chance? Other than pi and phi, what other numbers never occur by chance?

Obviously "never" is not the word to use when asking questions about probability.

Compressible strings occur with very low probability in one-element samples taken from uniform distributions of strings. Examples of compressible strings include the first n bits of the expansions of pi, phi, e, etc. as well as the bit representation of the first n primes (or fibonacci numbers, catalan numbers, etc.) concatenated in increasing order. They also include strings with a lot of repetitions--ones that are compressible in the sense that a compression algorithm like gzip will reduce the size of files containing them. There are lots of other examples, with Kolmogorov complexity serving as the most general definition of compressibility.

To make this rigorous, you can think of an experiment in which I am able to flip an unbiased coin n times to produce a string of n bits (H=1, T=0). In addition, I have a universal computer of some kind (the details only affect some constants). I can speculate on the probability of finding any n-bit string for which I can write a program on my computer using k bits or less.

You might reasonably set k to about 1000 and n to about 1100 for purposes of illustration. 1000 bits is probably about enough to space for code to calculate most of the well-known transcendental constants. It might not be the most readable code or fastest, but with some ingenuity you can probably fit some iterative method in this space.

Now I run the experiment. My assistant flips 1100 coins one by one and writes down the result without showing it to me. Before looking at the string, I ask "What is the probability that this string will be identical to the output of some program written in 1000 bits or less of code on the computer I have here in front of me?" Note that output means the final result written by the program after it halts and includes all 1100 bits written out.

There are no more than 2^1000 strings that can be output using 1000 bits or less of code and 2^1100 possible strings that my assistant may have just generated, each occuring with equal probability.

Therefore, I can claim that the probability of it being one of these strings is no more than 2^1000/2^1100, or 2^(-100), which is a very small probability (about 10^-30).

Now I look at the string, which happens to be the prefix of the expansion of pi. I think "Oh, this is one of the very low probability events that I just defined." I then write a 1000-bit or less program to generate pi just to make sure I'm correct in thinking so.

Now, suppose it's possible my assistant could play a joke on me. Which is more reasonable, that flipping coins gave me one of the fewer than 2^1000 compressible strings (less than 10^-30 probability) or that my assistant is, in fact, playing a joke on me? I would say the latter, but you could disagree. I wouldn't draw hasty conclusions, but would just try to rule out the possibility in the future.

Which step in the above do you consider "numerology"? It is only elementary probability.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,14:13

For that matter, let's consider a more routine example of statistical inference similar to the previous example of flipping a coin.

In the new example, I will flip the coin 100 times and simply count the number of heads and tails that come up. Beforehand, I ask myself "What is the probability that a flipping a fair coin in this fashion will come up heads 10 times or fewer?"

I treat this question much like the previous example. There are N=2^100 possible results of fair coin flips, each equally probable. There are far fewer that fit my a priori constraints of having 10 heads or less, namely M=(100 choose 0) + (100 choose 1) + ... +(100 choose 10) (i.e. the sum of the first 11 degree-100 binomial coefficients). The probability that one of these comes up is M/N, which is quite small (left as an exercise).

If I do this experiment and find 10 or fewer heads, I may have witnessed a very low probability event occurring with a fair coin. But I may reasonable infer that in fact it's not a fair coin after all. In fact, there's a way to infer that using something called a p-value, based on the probability. The lower the probability the more confidence I have that the coin is not fair.

This is all pretty routine stuff, routinely accepted. No casino would use dice that failed this kind of statistical inference test.

Now if I base my statistical inference on Kolmogorov complexity, defining the low probability events as the subset having low Kolmogorov complexity, then how is that any different than defining it based on grouping results by number of heads. Both are simply ways to group results, and both are valid for statistical inference?

I'd like to add that none of this supports ID. I am merely discussing the scientific significance of highly compressible data. When data is compressible, it demands some explanation other than chance. In the case of living things, evolutionary theory provides that explanation.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,18:10

Russell: Prefix of pi means what you think it means.

The way I would formalize my reasoning here would be to define a p-value (a standard tool of statistical inference) for strings of nucleotide bases in terms of P(n, L), the probability that the Kolmogorov complexity of a uniformly chosen random nucleotide string of length n is L or less. For values of L significantly less than n, P(L) is vanishingly small (in other words, almost all strings are incompressible; this is a well known theorem).

Then for any given string s, you can take its Kolmogorov complexity L_s (unfortunately this cannot be calculated exactly but you can get an upper bound by measuring the length of a computer program to calculate it) and define its p-value as P(n, L_s). For any given string, a low p-value gives you confidence of its significance as follows:

Suppose you find a string s and note that its Kolmogorov complexity is L_s<<n. You might think that's significant, but you need to consider the possibility that it was produced by chance. So you ask, what's the probability that a string with such low Kolmogorov complexity could be produced by chance? That's your p-value. If the p-value is very low (which it will be for L_s<<n) then you have a measure of your confidence that it was not produced by chance.

Note that a much more conventional use of p-values would allow you to find a p-value for a sequence of coin flips based on the number of heads and conclude with some measurable degree of confidence that it was not produced by flipping an unbiased coin (e.g. because way too many flips came up heads). That is just very run of the mill statistical inference, nothing esoteric about it, and this is merely an extension taking Kolmogorov complexity into account.

In practice, nobody uses Kolmogorov complexity like this for scientific purposes, but it is the basis of a rigorous version of Occam's razor called Solomonoff induction. (Some of this is a little new to me from doing recent searches--it's not exactly my field--but there is a lot of information out there.)

Unlike ID, this is sound methodology. Some of Dembski's arguments are superficially similar to this, but he's blowing smoke because all you can do with these arguments is show that the string was not produced by chance alone. This is not sufficient to conclude anything about a "designer."

Evolution is a process of self-organization. It is influenced by chance events, but not identical to chance. There is also a wealth of evidence supporting it. So when you find pattern (compressibility) in nature, this in no way contradicts the idea that it was produced by a natural process. It only contradicts the assumption that it can be explain by uniform chance.

WayneFrancis

Posts: 4
Joined: Nov. 2005

(Permalink)

Posted: Nov. 17 2005,23:32

compairing a message that say "here is the DNA sequence you asked for [GCAGGTTAACAAGGAGTTTGCTAGAT]
then working with that sequense and finding it can represent PI is one thing

Looking at real DNA and finding the sequence GCAGGTTAACAAGGAGTTTGCTAGAT
in the DNA means another.

There is nothing saying you can't have a sequence of DNA that can be translated into PI occur naturally.

There is no invisible mechanism that stops said sequence from occuring.

All this is really hypothetical. So what if you analyse organisms X's DNA and don't find PI in any one of the 24 forms it could be held in? Also remember I said my example of PI was simplistic. It really does not represent PI but 3141592653589793. To really find PI you not only have to know the conversion to the 24 different combinations but you need to know the standard that was used to store floating point numbers in a digital format. Is it using a IEEE standard? How many nucleotides are used to store your number PI. How many nucleotides represent the precision of PI. What nucleotide/partial nucleotide represents positive and negative? At what position is the bias information held.

These are all questions you CAN NOT ANSWER. Give me any 1 million bit length of data and I can come up with some encoding meathod that I'll get PI or a string that if I used some other encoding of PI would result in some seemingly non random data given enough time.

In terms of PI in naturally occuring DNA don't even talk about as it relates to an "Intelligent Designer" unless you have said designers data storage standards.

Simple as that. The scenario you meantion with the message with a sequence in it only means 2 things.
1) the person that encoded the message is an idiot for using said encoding meathod.
2) you where lucky picking the key.

Finding it in real DNA means NOTHING. There is no biological mechanism that stops DNA from having a string of nucleotides that can be transalted into PI by some digital manipulations. None nadda zilch.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,03:50

Quote

Finding it in real DNA means NOTHING. There is no biological mechanism that stops DNA from having a string of nucleotides that can be transalted into PI by some digital manipulations. None nadda zilch.

Of course there's no mechanism that completely stops it from being explained by chance. However, using a rigorous technique from statistical inference (see my posting above) you can show that the chance explanation is not compelling, because the probability of selecting a sufficiently long string of such low Kolmogorov complexity from a uniform distribution of strings of that length is vanishingly small. If you substitute "uniform distribution" with our best statistical model of junk DNA, the statement still holds, although it would be more difficult to analyze.

This is elementary statistical hypothesis testing. The only difference is that I'm using Kolmogorov complexity rather than some more conventional statistics to group the set of possible outcomes.

If I flipped a coin 100 times and it came up heads only 10 of those times, would you simply accept that there is no "mechanism that stops" this from happening, or would you refer to a huge body of standard statistical techniques that would allow you to state your high confidence that this is not an unbiased coin?

If I'm looking at a sequence purporting to be DNA and I think it contains a sufficiently simple code for a bit prefix of pi, that means something. It probably means that it's not really DNA or that in the unlikely event it is, it was put there by a human being. But to simply claim its presence is dismissable as cherry picking is no more rational than rejecting the statistical inference techniques used to determine if various randomizers (like coins and dice) are actually unbiased.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,04:15

I think I'm beginning to understand some of the resistance here and on PT. Some of the worst obfuscated ID pseudoscience (yes, I mean Dembski) looks superficially like the legitimate, peer-reviewed discipline of randomness testing. http://www.ciphersbyritter.com/RES/RANDTEST.HTM This makes it difficult to bring up such arguments without being suspected of trying to push ID.

It's kind of an old joke that you ask someone to make some random choices. Then, they look at those choices and say, "Oh, that's not random enough." and choose something else. It's a JOKE because the choices taken from a uniform distribution are all equally probable, so, for instance, having my dog's birthday come up as the winning lottery ticket is no less probable than any other sequence.

All well and good, but there is a huge body of modern techniques that can actually take INDIVIDUAL strings and declare one to be more "random" than the other in a rigorous sense. The rigorous sense is always based on some notion of compressibility, the most general statement of which is Kolmogorov complexity. So if I ask you to write random 1s and 0s and you write 0101010101010101, there is a rigorous sense in which this is considered not "random enough." Of course it is as probable as any other uniformly chosen string but its compressibility is a salient feature and the probability of picking a string with that much compressibility is indeed vanishingly small.

Anyway, the is a huge body of rigorous peer-reviewed work aimed at testing individual strings and determining if they came from a random source. This is literally impossible, because no matter what the string, it COULD have come from the random source, as WayneFrancis points out. But when one is trying to design a good pseudorandom generator for sampling, or trying to resolve signal from noise, it's just not good enough to stop and say "Well, it could be due to chance." Fortunately, we do not have to stop there, as many researchers have demonstrated.

PaulK

Posts: 37
Joined: June 2004

(Permalink)

Posted: Nov. 18 2005,06:17

Paul, I agree with the general point that if you specify a sufficiently unlikely target in advance - say 100 digits of pi in a fixed point base-4 notation in the human genome - then finding it would be evidence that something odd is going on.

Going back to the subject of the thread, though, one anomaly does not make a scientific discipline. It might be a starting point for a scientific form of ID, but it could never make ID scientific in itself.

Going further, if ID is to be scientific it is up to the supporters of ID to actually BE scientific.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,06:39

Quote

Paul, I agree with the general point that if you specify a sufficiently unlikely target in advance - say 100 digits of pi in a fixed point base-4 notation in the human genome - then finding it would be evidence that something odd is going on.

Yes, but I'm pre-specifying a much more general unlikely target, namely any member of the set of length n strings of Kolmogorov complexity k<<n. This is why we're justified in taking anything that looks like a pattern (in a rigorous sense of Kolmogorov complexity and Solomonoff induction), and not just one prespecified string, as evidence that something odd is going on. Solomonoff induction is a very important result, since it formalizes scientific induction and really does provide a rigorous formalization (if not an epistemic justification) of the human tendency to spot patterns in data and a way to measure the explanatory value of these patterns (turns out, the shortest explanation is the best one just as in Occam's razor).

I've been a little bowled over by the controversy, since I believed that most rational people would think that a pair of dice that came up snake eyes 50 out of 100 times was compelling evidence of "something odd going on"--i.e., the dice being biased in some way, even though clearly the outcome could with small probability be due to chance.

The statistical inference in both cases is identical, and actually both are explicable in terms of compressibility. The sequence of rolls from a pair of fair dice are incompressible, whereas those from a biased pair are compressible (e.g. using a Huffman code to use a smaller bit sequence to represent the more frequent outcomes.)

Quote

Going back to the subject of the thread, though, one anomaly does not make a scientific discipline. It might be a starting point for a scientific form of ID, but it could never make ID scientific in itself.

Yes, I agree with that entirely. But this is not the same as saying that every anomaly is dismissable as cherry picking. An anomaly is an anomaly and there are rigorous means for identifying anomalies using statistical inference.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,06:55

Quote

Going further, if ID is to be scientific it is up to the supporters of ID to actually BE scientific.

ID the political arm of a religious movement, so don't hold your breath. In case there's any question, I just want to repeat (I've said it twice now), ID is claptrap.

Randomness testing is, however, a legitimate and fascinating field with many peer-reviewed results and implications about inductive reasoning. Unfortunately for ID, randomness testing cannot prove design, only a failure of uniform random processes to explain observations.

Given that evolution is no more random than water crystallization, galaxy formation, the arrangement of cracks in a field of drying mud, etc., etc., then we should not be very surprised to find that most biological data sets would tend to fail randomness tests.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,09:46

Adami, Christoph (2002) What is complexity?, BioEssays, Volume 24, Issue 12

Quote

One of the measures most often put forward as a candidate, the Kolmogorov complexity (see, e.g., Ref. 2), turns out to be a measure of the regularity, rather than complexity,of a sequence. This implies that a random sequence is accorded maximum Kolmogorov complexity, clearly not anything we would be interested in as biologists, because random sequences do not give rise to organisms.

I understand this is a thought experiment and you are not fronting Intelligent Design a la Dembski. �I assume your arguments are good, however, the water is too deep for me. �I'm heading back to the gene pool, �but before I go I must ask a few questions:

Shouldn't we expect a certain degree of regularity in complex symmetrical organisms?

Is the lottery sequence 12-34-42-9-3 PowerBall 12 less likely to occur because I pre-specify and buy a ticket with those numbers than if I never bought a ticket?

Is k<<n a typo for k<n or do I need to get a math degree?

Didn't Einstein say, "You do not really understand something unless you can explain it to your grandmother."? �(Maybe you can simplify your argument for me. �I have some statistical/calculus background, but don't bet on me understanding an argument requiring calculus.)

To return to the original topic: �I think that evidence of a recognizable number in nucleotide organization is more likely evidence of an underlying organizational principle rather than Intelligent Design, much the same way 9.8m/s^2 is evidence for local gravitational forces rather than Intelligent Falling.

Russell

Posts: 1082
Joined: April 2005

(Permalink)

Posted: Nov. 18 2005,09:53

Disclaimer: I don't think anyone participating in this discussion thinks ID is anything but a sham and a scam. Nothing I write should be interpreted otherwise!

Quote

...then we should not be very surprised to find that most biological data sets would tend to fail randomness tests

But do they? I think Dembski's "specification" (equivalent to stating beforehand that you're looking for Pi, or for string of greater than such & such Kolmogorov complexity in the above discussion) amounts just to "sequence corresponding to something biologically functional".

If you take a random chunk of DNA from, say, a human genome, you're likely to get a pattern that won't trigger any raised eyebrows by any mathematical analysis, but that corresponds to the gene for some essential enzyme. Does that qualify as failing a randomness test?

--------------
Must... not... scratch... mosquito bite.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:29

Quote

I understand this is a thought experiment and you are not fronting Intelligent Design a la Dembski.

Glad to hear it. The quote from Adami is correct and a pretty damning criticism of Dembski's entire "research program." Note that I am not proposing a test for high complexity here but for low complexity. The reason a bit-prefix of pi is worthy of note is that it is a LOW complexity object.

Quote

Shouldn't we expect a certain degree of regularity in complex symmetrical organisms?

Yes indeed! And we find it there. The order goes way beyond bilaterial or radial symmetry. In fact, every cell is statistically similar to other cells of the same kind. The statistical frequencies of various macromolecules are far from uniform. Living things are just brimming with regularity and therefore higher compressibility than uniformly random arrangements of matter.

In the sense of Kolmogorov complexity, a live frog is LESS complex than a frog that's been in a high speed blender for a minute. That is, the set of configurations of molecules that could be a live frog is a tiny subset of the set of configurations of molecules that could result after blending.

High Kolmogorov complexity is in no way a proxy for organization and this is where Dembski has to start scrambling for new definitions, which he generally pulls out of a hat (to put it gently). Actually low Kolmogorov complexity isn't a good proxy either, since living things are both less orderly than low-complexity objects like crystals, but more orderly than uniform random objects. This is why it's a difficult thing indeed to pin down life in terms of Kolmogorov complexity, and it's not even clear if the question has any meaning.

It is the relatively LOW Kolmogorov complexity of a frog that makes it more interesting than a frogshake. It is the relatively HIGH Kolmogorov complexity of a frog that makes it more interesting than a salt crystal of equal weight. So where does this leave us? Maybe Kolmogorov complexity is not a great tool for understanding why we find frogs interesting.

Quote

Is the lottery sequence 12-34-42-9-3 PowerBall 12 less likely to occur because I pre-specify and buy a ticket with those numbers than if I never bought a ticket?

Clearly not. Not sure where you're going with that. In a lottery any individual outcome is equiprobable. To make any interesting statistical inference, we need to group the values in some way before asking a question.

Quote

Is k<<n a typo for k<n or do I need to get a math degree?

Informally, it is used to mean "much much less". Obviously that's not defined rigorously, but I wanted to emphasize that I don't mean k=n-1. It would be sufficient to take k<n/10 to make the same point I was making.

Quote

Didn't Einstein say, "You do not really understand something unless you can explain it to your grandmother."? (Maybe you can simplify your argument for me. I have some statistical/calculus background, but don't bet on me understanding an argument requiring calculus.)

He might have but I have been trying to make this both rigorous and understandable and this is close to the limit of my expository skills.

Quote

To return to the original topic: I think that evidence of a recognizable number in nucleotide organization is more likely evidence of an underlying organizational principle rather than Intelligent Design, much the same way 9.8m/s^2 is evidence for local gravitational forces rather than Intelligent Falling.

Well, as I said, my first guess is that it would be evidence of a human either faking the data or (longshot) actually inserting the artifact into DNA. Beyond that, I don't care to speculate. My point is just that it would be an identifiable anomaly not obviously dismissed as cherry picking.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:41

Quote

If you take a random chunk of DNA from, say, a human genome, you're likely to get a pattern that won't trigger any raised eyebrows by any mathematical analysis, but that corresponds to the gene for some essential enzyme. Does that qualify as failing a randomness test?

Actually, the encoding part of DNA has to specify functional proteins composed of such structures such as alpha helices and beta sheets. A uniformly generated sequence of nucleotides would have very low probability of encoding a biologically functional protein in any organism (at least any that looks like the ones we see in nature). Its 3D conformation would not have the degree of order observed in biologically important proteins.

I agree, though, that you won't necessarily see that in the Kolmogorov complexity of an isolated gene. However, given the set of genes taken in aggregate over an organism, repeated "motifs" (similar functional sections) are often observed. So the DNA as a whole, while not as compressible as a bit prefix of pi, is still more compressible than a randomly generated sequence of nucleotides.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,10:49

I would like to add that the failure to demonstrate the compressibility of some data does not imply that there is no order to it. It may simply mean that you haven't figured out the sense in which it is orderly. Supposing that the orderliness of life comes out in phenotype, this need not be reflected in an obvious any way in a gene sequence, and vice versa.

Bulman

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,11:02

Great discussion, it just took me a while to figure out what you were saying. �Don't get me wrong, I'm not totally incompetent when it comes to math, but I can more easily understand a frog blender analogy than Russian mathmaticians.

I have tried to rephrase my lottery response in this comment and have failed numerous times. �Instead of digging a deeper hole, let me just say two things: my wife often accuses me of using "bad" metaphors, and; behavioral observations made by close associates tend to be more accurate than one's own.

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 18 2005,11:17

Russell: Actually I think I can do a little better on claiming that a sufficiently long encoding region of DNA is highly compressible.

We need to assume a couple of things.
(1) You need to have solved protein folding computationally. There has to be a computer program that can predict the conformation of a protein sequence encoded by some DNA.
(2) You need to have an effective method of ruling out proteins that are not even plausibly functional based on 3D structure.
(3) The computer code for these methods needs to fit into space less than that needed to store the genome you want to compress.
(4) The set of plausibly functional proteins of a certain length is much much smaller than the set of all nucleotide sequences that could encode proteins of that length.

The compression/decompression scheme would unfortunately not be feasible to run in any practical time frame (but it doesn't need to be). It would begin by enumerating all possible DNA sequences up to the maximum length needed (you can see this is not feasible). Then it would test each by finding its conformation and checking if it could possibly be functional (e.g. an enzyme). It gives each of these an integer index number starting at 0 and stores it in a table (for purposes here it is not an obstacle that this code would use up more memory than realistically available in the entire universe).

Now the compressed copy of the DNA consists of the computer program followed by the DNA sequence with the encoding regions replaced by the highly compressed index number (OK, you need extra information to deal with the fact that encoding regions may not be contiguous, but this is a broad proof of concept that the genome is highly compressible, not a practical plan.) Since the index numbers come from a smaller domain, it follows that they are shorter than the original nucleotide sequences. So the compressed version as a whole is smaller than the original even if there is no obvious pattern to the encoding regions other than the fact that they have to encode proteins.

To decompress, you "merely" run the program to build the table of proteins, look up the indices in the table, and output the resulting string.

claw

Posts: 3
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,02:25

This has turned into a fascinating discussion, but we seem to have lost lutsko on the way.

If you're still reading this, lutsko, the point I was trying to make at the start of all this is that "designedness" has no definitive test. All that you were doing was quoting probabilities (ie. the probability of a DNA sequence that encodes pi to 100 places), but that is not a good test of designedness for a number of reasons that have already been elucidated by other posts here.

Anyway, I hope this thread has helped to pique your interest.

regards,
Chris

PaulC

Posts: 18
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,05:39

Quote

If you're still reading this, lutsko, the point I was trying to make at the start of all this is that "designedness" has no definitive test.

I agree with this statement (in case it was not clear).

All randomness testing can do is give you a probabilistic measure of your confidence that the data was not generated by the random distribution you are using as your null hypothesis. It says absolutely nothing about where the data came from. To say anything else, you need some other hypothesis about your data, and it needs to be testable. The notion that some mysterious intelligent agency put it there is not testable. ID insists on leaving the "designer" ill-defined and therefore by definition cannot be a science.

Sometimes you can make a testable hypothesis of a human cause, because we can come up with a model for what humans do and don't do. Then, conceivably, you could make predictions that would be supported or refuted by later evidence. This happens in archeology, for instance, in cases where it might initially be unclear if a chipped stone is a human-made tool. And you make predictions about what else you might find at the site and related ones.

I would still say that finding pi in some "reasonable" (sufficiently parsimonious in a rigorous sense) encoding would be reason to hypothesize human agency. That wouldn't apply to any compressible string, but just one that we know humans find interesting like pi. If it happened just once, though, your hypothesis would be untestable and therefore (as I agreed earlier) not a basis for a scientific theory.

lutsko

Posts: 8
Joined: Nov. 2005

(Permalink)

Posted: Nov. 19 2005,21:37

To claw, PaulC, et al,
I have been following the discussion with interest but have not felt compelled to add anything since you, especially PaulC, are doing an admirable job of fleshing out the thought experiment. I think the point stands that the program of ID could be legitimate science although I hasten to add, as one must, that ID as pushed by Dembski and friends is anything but scientific. I also think that our friends fighting the political fight need to be careful not to make catagorical statements to the contrary which they might find hard to defend.

claw

Posts: 3
Joined: Nov. 2005

(Permalink)

Posted: Nov. 20 2005,05:05

Dear lutsko,

With due respect, I don't think you do understand what has been going on in this forum.

You say "the point stands that the program of ID could be legitimate science" but you fail to understand that this is exactly what everyone else has been arguing *against*. If by your statement you mean that the existence of design in biology could be scientific, then that is not much of a "point" because everyone agreed long before you formulated it. We even gave examples (eg., bacteria being designed in the lab, which already happens). But that's not what you said. You said "the program of ID could be legitimate science." Now what program is that? There is only one program of ID, and that's "to reverse the stifling dominance of the materialist worldview, and to replace it with a science consonant with Christian and theistic convictions." Quote unquote.

Now I understand that you repudiate Behe and Dembski et al, and I think that's wonderful. BUT, you have to start thinking more about what you are saying and the way you phrase it. ID is not a cover-all term for any form of design in science. It very specifically refers to the program running out of the Discovery Institute. Even the Catholic Church, which believes very strongly in divine intervention in the evolution of humanity, refuses to identify itself with ID. Please choose your words more carefully or else we will continue to waste time pinning down exactly what you mean before we can respond meaningfully.

Also, we have *not* been helping you "flesh out the thought experiment" and the fact that you would say so is actually rather insulting. We came across to this forum because we assumed you wanted to discuss the matter and maybe learn something, not pontificate about the good job everyone else is doing on developing your pet project. I think your thought experiment is fundamentally flawed and I have been trying to explain why it won't work, not "flesh it out." I know you don't support the Discovery Institute view of ID, and I applaud that, but rejecting Dembski et al doesn't automatically make your thought experiment any better than theirs. There is more than one way to be wrong about evolution.

I am also discouraged by your failure to contribute to a thread *you* started to pursue a question *you* asked. Right at the moment I'd like to read a substantive post before I make any further effort in this thread. You could start by attempting an answer to the counter-questions I posed earlier.

regards,
Chris

	Antievolution.org :: Antievolution.org Discussion Board The Critic's Resource on Antievolution