Thanks to Google scholar, I recently become aware of the article by Mohammed AlQuraishi & Harley McAdams (2012) Three enhancements to the inference of statistical protein-DNA potentials” in Proteins: Structure, Function, and Bioinformatics. Reading through the text, I like it quite a bit. The abstract summarize the work well:
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
I’m glad to find that the 3DNA mutate_bases program was used in deriving the statistical potentials of protein-DNA interactions:
The relative binding affinity of a protein to two different DNA sequences can be evaluated by computing the binding energy of the protein to those two sequences. This is done by mutating the DNA sequence in silico while keeping the protein fixed. We used the 3DNA software package for mutating DNA23,24, which maintains the backbone atoms of the DNA molecule but replaces the basepair atoms in a way that is consistent with the backbone orientation of the DNA.
For each base position, in silicon structural mutants are generated using 3DNA23,24 to mutate the basepair to include all four possibilities.
This is exactly one of the use cases I have in mind while creating the program:
Overall, mutate_bases has been designed to solve the in silica base mutation problem in a practical sense: robust and efficient, getting its job done and then out of the way. The program can have many possible applications: in addition to perform base-pair mutations in DNA-protein complexes, it should also prove handy in RNA modeling and in providing initial structures for QM/MM/MD energy calculations, and in DNA/RNA modeling studies.
With the recent refinement to allow for 3-letter nucleotide name in the standard base-reference frame file, mutate_bases now makes it exceedingly easy to mutate cytosine to 5-methylcytosine.
As more people get to know this 3DNA functionality, I am confident that mutate_bases will be more widely used.