stevestory
Posts: 13407 Joined: Oct. 2005
|
Quote (dnmlthr @ Aug. 26 2008,17:34) | Does compressibility play into this? That is, how much do you need to know to recreate a document*?
Consider a document containing the complete works of Shakespeare, I'd expect it to be reasonably compressable, likewise a document of the same size containing only the same letter over and over again. But how about a document of the same size that contains completely random characters? |
To test this in a first-order, amateurish way, I got the full text of Hamlet and saved it in a text file. It was 197 kb. When I zipped it, the zip file was 72kb. 36% the size of the original. I also made a text file of one letter over and over about 180,000 times, roughly the same length as Hamlet, and the file size was around 176 kb. When I zipped that, the file was 4kb, around 2% of the original. And in fact it's probably less and 4kb is just some kind of minimum file size on my machine. A file that was completely, truly random would be basically incompressible, and the zipped file would be 100% the size of the original.
(various caveats go here: zipping isn't perfect compression, I are not a information theorist, etc etc.)
|