Most computational approaches to transcriptional regulation use sequence-based methodologies, that aim to discover regulatory motifs in genomic segments. Here we argue that the current content of the Protein Data Bank (PDB) can provide invaluable data that drive the
prediction of regulatory interactions within genomes. First, we dissect protein-DNA interfaces and find atomic interactions that contribute to sequence-specific recognition, mainly hydrogen bonds and Van derWaals contacts. These specificity determinants can be expressed in terms of atomic weight matrices, that are shown to be robust in bootstrap experiments and yield scores that correlate with approximate measures of binding specificity. Second, using example transcription factors from Escherichia coli we find that some protein-DNA interfaces have sequence-dependent DNA geometries that constitute indirect readout mechanisms, in agreement with previous reports. Third, we are able to build structure-based position weight matrices that capture both types of recognition mechanisms and test them
in genomic experiments, with results comparable to sequence-based methodologies. We conclude that the PDB can be further exploited in exploring transcriptional regulation and other biological processes mediated by protein-DNA interactions.
Funded by CSIC grant number 200720I038
Peer reviewed