SCI:ALIFE: the things of shape to come

Eugene Leitl (Eugene.Leitl@lrz.uni-muenchen.de)
Mon, 13 Jan 1997 15:05:41 +0100 (MET)


"Extended Molecular Evolutionary Biology: Artificial Life Bridging the Gap
Between Chemistry and Biology", P. Schuster, in "Artificial Life, an
Overview", Christopher G. Langton (ed.), MIT Press (1995), pp. 39-60.

[...] Evolutionary phenomena, in particular selection and adaptation to
changes in the environment, occur only at conditions far away from the
thermodynamic equilibrium. Spiegelman studied RNA molecules from small
bacteriophages and created nonequilibrium conditions by means of the
serial transfer technique (Figure 1). Material consumed by multiplication
of RNA molecules is renewed, and the degradation products are removed at
the end of constant time intervals by transfer of small samples into an
excess of fresh stock solution. Continuous renewal and removal can be
achieved in elaborate flow reactors [3,4]. [...] As far as the principle
of polynucleotide replication is concerned, there seems to be no reason
why molecular replication should need a protein catalyst. Extensive
studies by Orgel and coworkers [...] have indeed shown that
template-induced synthesis of complementary strands of RNA can be
achieved under suitable conditions without an enzyme. [...] It seems
necessary to stress a fact that is often overlooked or even ignored by
theorists and epistemologists. Molecular replication is anything but a
trivially occuring function of the molecules. [...] Everybody who has
experience with primitive computing machines knows that the copy
instruction is a very simple function. Chemistry and early biological
evolution are radically different from computer science in this respect:
Replication has to find a simultaneous solution to all requirements,
which is generally in conflict with common physical chemistry. Working
compromises between contradicting demands are rare, and, hence, only
highly elaborate structures might be able to replicate efficiently
without specific help. [...] The stationary mutant distribution is
characterized as _quasispecies_ [...] because it represents the genetic
reservoir of asexually replicating populations. An increase in error rate
in the replication on a given fitness landscape leads to a broader
spectrum of mutants and, thus, makes evolutionary optimization faster and
more efficient in the sense that populations are less likely caught in
local fitness optima. There is, however, a critical error threshold
[...]: If the error rate exceeds the critical limit, heredity breaks
down, populations are drifting in the senes that new RNA sequences are
formed steadily, old ones disappear, and no evolutionary optimization
according to Darwin's principle is possible [...]. Experimental analysis
of several RNA virus populations has shown that almost all chain lengths
are adjusted to yield error rates close to the threshold value. Thus, RNA
viruses appear to adapt to their environment by driving optimization
efficiency towards the maximum. [...] Evolutionary optimization is viewed
appropriately as an adaptive walk on a rugged fitness landscape [...].
Population dynamics on realistic landscapes based on RNA folding has been
studied by computer simulations [...]. Error thresholds were detected on
these rather very rugged landscapes, too. On completely flat fitness
landscapes, and new phenomenon was observed [...]: At sufficiently high
replication accuracy populations move as coherent peaks in sequence
space. There is, however, again a critical error rate. If it is exceeded,
the population loses its coherence in sequence space and becomes
disperse. It is suggestive, therefore, to call this second critical error
rate the _dispersion threshold_. [...]

RNA secondary structures are first approximations to the spatial
structures of RNA molecules. They are understood as listings of the
Watson-Crick-type base pairs in the actual structure and may be
represented as planar graphs [...]. We consider RNA secondary structures
as elements of an abstract _shape space_. As in the case of sequences
(where the Hamming distance d_h represents a metric for the sequence
space), a measure of relationship of RNA structures can be found that
induces a metric on the shape space. We derived this distance measure
from trees that are equivalent to the structure graphs, and accordingly
it is called a tree distance, d_t. Thus, RNA folding can be understood as
a mapping from one metric space into another, in particular, from
sequence space into shape space. A path in sequence space corresponds
uniquely to a path in shape space. (The inversion of this statement,
however, is not true as we shall mention in the section 4.) [...]

The sequence space is a bizarre object: It is of very high dimension
(because every nucleotide can be mutated independently, its dimension
coincides with the chain lenght of RNA: 25 < n < 500 for RNA in test tube
experiments, 250 < n < 400 for viroids, and 3500 < n < 20000 for (most)
RNA viruses), but there are only a few points on each coordinate axis
(\kappa points; \kappa is the number of digits in the alphabet: \kappa=2
for AU and GC, \kappa=4 for AUCG). The number of secondary structures
that are acceptable as minimum free energy structures of RNA molecules is
much smaller than the number of different sequences and can be estimated
by means of proper combinatorics [...]: In case of natural (AUCG)
molecules we have about 1.485 x n^{-3/2} (1.849)^n structures for 4^n
sequences. The mapping from sequence space into shape space is not
invertible: Many sequences fold into the same secondary structure. We
cannot expect that our intuition, which is well trained with mostly
invertible maps in three-dimensional space, will guide us well through
sequence and shape spaces. [...] As indicated [...], we search for
_neutral paths_ through sequence space. The Hamming distance from the
reference increases monotonously along such a neutral path, but the
structure remains unchanged. A neutral path ends when no further neutral
sequence is found in the neighbourhood of the last sequence. [...]
Clearly, a neutral path cannot be longer than the chain length [...]. It
is remarkable that about 20% of the neutral paths have the maximum length
and lead through the whole sequence space to one of the sequences that
differ in all positions from the reference, but have the same structure.

Combination of information derived from [...] provides insight into the
structure of the shape space of RNA secondary structures, which is basic
to optimization of RNA molecules already in an RNA world. Our results can
be summarised in four statements:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

1. Sequences folding into one and the same structure are distributed
randomly in sequence space.

2. The frequency distribution of structures is sharply peaked. (There are
many comparatively few common structures and many rare ones.)

3. Sequences folding into all common structures are found within
(relatively) small neighbourhoods of any random sequence.

4. The shape space contains extended neutral networks joining sequences
with identical structures. (A large fraction of neutral paths leads
from the initial sequence through the entire sequence space to a final
sequence on the opposite side -- there are (\kappa - 1)^n sequences
that differ in all positions from an initial sequence).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[...] These results suggest straightforward strategies in the search for
new RNA structures. It provides little advantage to start from natural or
other preselected sequences because any random sequence would do equally
well as the starting molecules for the selection cycles of evolutionary
biotechnology shown in [...]. Any common secondary structure with optimal
functions is accessible in a few selection cycles. [...] If no RNA
molecule with satisfactory properties is found, a change to high error
rate is adequate. Then the population spreads along the neutral network
to other regions in sequence space, which can be explored in detail after
tuning the error rate low again.

The structure of shape space is highly relevant for evolutionary
optimization in nature too. Because long neutral paths are common,
populations drift readily through sequence space whenever selection
constraints are absent. This is precisely what is predicted for higher
organisms by the neutral theory of evolution [...] and what is observed
in molecular phylogeny by sequence comparisons of different species. The
structure of shape space provides also a rigorous answer to the old
probability argument against the possiblity of successful adaptive
evolution [...].

'gene: