[Background] Since thermodynamic stability is a global property of proteins that has to be conserved
during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids
present at other sites. However, models of molecular evolution that aim at reconstructing the
evolutionary history of macromolecules become computationally intractable if such correlations between
sites are explicitly taken into account.
[Results] We introduce an evolutionary model with sites evolving independently under a global constraint
on the conservation of structural stability. This model consists of a selection process, which depends on
two hydrophobicity parameters that can be computed from protein sequences without any fit, and a
mutation process for which we consider various models. It reproduces quantitatively the results of
Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native
state is explicitly computed and conserved. We then compare the predicted site-specific amino acid
distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation
model, whose number varies between zero and five, are fitted from the data. The mean correlation
coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70
for a mutation model with no free parameters and no genetic code. In contrast, considering only the
mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted
parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG
dinucleotides, yielding <r> = 0.90 with five parameters.
[Conclusion] The effective selection process that we propose reproduces well amino acid distributions
as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood
calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the
effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively
independent sites. This interdependence between mutation and selection reflects the deep influence that
mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic
properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it
also influences the rate of accepted mutations.
Peer reviewed