Joined: May 2002
I decided to start this thread to collect references about protein evolution specifically as it relates to folding. First I'll repost an essay that I posted to the ISCID forum, skipping the irrelevant parts:
Now for some more general stuff about protein folding and evolution. Josh gives us a very educational post about some of the complexities of protein folding and what it means to biology. I'm going to skip most of the stuff about the dynamics of protein folding, because I think, at least as it relates to the suboptimality argument, I've addressed that above, and also because I don't have the necessary background in physics to know that much about it. I think Josh can appreciate that, because he notes that there's tons of stuff that we don't know. The literature on the dynamics of protein folding is very large, but it's also difficult to read (for me anyway). But I have reviewed some of the literature at it pertains to the evolution of protein folds, and I'll present some of that.
First of all, it’s a misconception even among many biochemists that all proteins need to fold to be functional. In fact, the importance of disordered proteins and those with long disordered regions is now becoming more clear. Try searching the lit for “intrinsically disordered proteins” and you’ll come up with a number of hits. These proteins (or certain domains) are unfolded and yet are perfectly functional, and in many cases are just as highly conserved as folded protein domains, though often of lower sequence complexity  (and hence, easier to evolve via random generation). In fact, there is evidence that disordered proteins outnumber ordered proteins, but that the ordered ones represent more resolved structures in the PDB simply because (big shock) they’re easier to crystallize. So one possible way for folded proteins to come about is by evolving from functional yet disordered proteins, and in this case there would never be a period of time when there was not a selectable function. And of course it’s not like there are only two kinds of proteins, folded and unfolded. There is also the intermediate molten globule state.
However, most protein folds are thought to evolve from other folds. This can be seen with the arrangement of protein folds into scale free networks. Two recent papers on this are relevant. The first one is Proc Natl Acad Sci U S A 2002 Oct 29;99(22):14132-6, Expanding protein universe and its origin from the biological Big Bang. I posted a number of excerpts from this paper here, so I won’t bother reproducing them. The point however is that the scale-free network in which protein folds fit is highly indicative of a duplication / divergence process from one or a few initial folds. The second paper, which came out about the same time, is Nature 2002 Nov 14;420(6912):218-23, The structure of the protein universe and genome evolution. Here’s the abstract:
| Despite the practically unlimited number of possible protein sequences, the number of basic shapes in which proteins fold seems not only to be finite, but also to be relatively small, with probably no more than 10,000 folds in existence. Moreover, the distribution of proteins among these folds is highly non-homogeneous -- some folds and superfamilies are extremely abundant, but most are rare. Protein folds and families encoded in diverse genomes show similar size distributions with notable mathematical properties, which also extend to the number of connections between domains in multidomain proteins. All these distributions follow asymptotic power laws, such as have been identified in a wide variety of biological and physical systems, and which are typically associated with scale-free networks. These findings suggest that genome evolution is driven by extremely general mechanisms based on the preferential attachment principle.|
So both of these papers support the idea that an evolutionary process not only can account for the emergence of protein folds, but that the distribution of folds is a predicted consequence of evolution.
Now finally, if you want a “beginning to end” account for protein evolution, there is this recent review article (and there are others out there):
FEBS Lett 2002 Sep 11;527(1-3):1-4, Molecular evolution from abiotic scratch.
I’ll see if I can reproduce what they’ve got listed as the five stages of protein evolution, though I’ll have to skip the discussion:
1. Homopeptides of Ala and Gly encoded by (GCC)-(GGC) duplexes.
2. Mixed peptides of two alphabet types.
3. Chains of optimal length close the ends by interactions between two amino acid residues.
4. The loops are joined in linear arrays and form folds (domains)
5. Modern, multidomain proteins are formed.
I don’t know if this model will last, or even if it can stand up to serious scrutiny right now, but it’s the kind of thing that needs to be explored in detail before we can really say anything about the likelihood of protein evolution de novo. Keep in mind that this particular model is trying to account for the evolution of proteins from the origin of life, which is necessarily tricky because it’s difficult to learn about this just from looking at modern life. However, once you have a functioning cell, I don’t see any problem with the ability of current theory to account for protein evolution.
[This last reference is at the end here for no real good reason]
1. Proteins 2001 Jan 1;42(1):38-48, Sequence complexity of disordered protein.
| Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins.|