RSS 2.0 Feed

» Welcome Guest Log In :: Register

Pages: (5) < [1] 2 3 4 5 >   
  Topic: Creating CSI with NS, H T T H H H T H T T H H H H T T T< Next Oldest | Next Newest >  
RumraketR



Posts: 3
Joined: Nov. 2012

(Permalink) Posted: Nov. 20 2012,14:32   

Quote (Jerry Don Bauer @ Nov. 19 2012,16:37)
Comparing the genome to computer data storage. In order to represent a DNA sequence on a computer, we need to be able to represent all 4 base pair possibilities in a binary format (0 and 1). These 0 and 1 bits are usually grouped together to form a larger unit, with the smallest being a “byte” that represents 8 bits. We can denote each base pair using a minimum of 2 bits, which yields 4 different bit combinations (00, 01, 10, and 11).  Each 2-bit combination would represent one DNA base pair.  A single byte (or 8 bits) can represent 4 DNA base pairs.  In order to represent the entire diploid human genome in terms of bytes, we can perform the following calculations:

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!


http://bitesizebio.com/article....-genome

is 1.5 Gigabytes more than 500 bits? Then why would we want to go any further than this as you already have the answer before you start.

ANY organism will be over 500 bits.[/quote]
Hello everyone, I've been a lurker here for a few years now and I just have to respond because this could be historical stuff.

I want to make sure I understand you correctly here, Jerry Don Bauer, because according to what I have quoted, you seem to be saying that the quantity of information in a string of symbols is equal to the length of the string divided by the number of possible symbols at each locus? As in the information content is measured in bits and is thus proportional to the length of the sequence?

You refer to the example of a 6 billion base-pair diploid genome, divided by the number of possibilities pr site (4):

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!

In other words, the information content of a sequence of DNA, for example 12 base-pairs in length, AUGAATAUGTTA, is equal to 12 base pairs x 1 byte/4 base pairs = 3 bytes.

Am I correct in my understanding here?

  
  128 replies since Oct. 06 2012,18:57 < Next Oldest | Next Newest >  

Pages: (5) < [1] 2 3 4 5 >   


Track this topic Email this topic Print this topic

[ Read the Board Rules ] | [Useful Links] | [Evolving Designs]