I recently read a short communication by Pavel Afonine, titled phenix.hbond: a new tool for annotation hydrogen bonds in the July 2019 issue of the Computational Crystallography Newsletter (CCN). It appears that every bioinformatics tool (e.g., PyMOL or Jmol) has its own implementation of an algorithm on calculating H-bonds, one of the fundamental stabilizing forces of proteins and DNA/RNA structures. So does 3DNA/DSSR, as noted in my 2014-04-11 blogpost Get hydrogen bonds with DSSR.
Both DSSR and SNAP have the --get-hbond
option, and they use the same underlying algorithm. However, the default output from the two programs differs: DSSR reports the H-bonds within nucleic acids, and SNAP covers only those at the DNA/RNA-protein interface. Using the PDB entry 1oct as an example, Running DSSR on it with the --get-hbond
option gives 33 H-bonds in the DNA duplex, while SNAP reports 38 H-bonds at the DNA-protein interface. By design, the default output caters for the most-common use case of each program.
Under the scene, however, there exist variations in the seemingly simple --get-hbond
option. One can attach text ‘nucleic’ (or ‘nuc’, ‘nt’), as in --get-hbond-nucleic
, to output H-bonds within nucleic acids. Similarly, --get-hbond-protein
(or ‘amino’, ‘aa’) would output H-bonds within proteins. Not surprisingly, the --get-hbond-nt-aa
option would list H-bonds in nucleic acids and proteins, including those at their interface. These variations apply to both DSSR and SNAP, even though some are redundant with the default.
Notably, in combination with --json
, the --get-hbond
option by default would output all H-bonds, as if --get-hbond-nt-aa
has been set. For PDB entry 1oct, DSSR or SNAP would report 208 H-bonds. Moreover, the JSON output has a residue_pair
field for each identified H-bond, with values like "nt:nt"
, "nt:aa"
, or "aa:aa"
. Using 1oct as an example,
# x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[0]' { "index": 1, "atom1_serNum": 34, "atom2_serNum": 608, "donAcc_type": "standard", "distance": 3.304, "atom1_id": "O6@A.DG202", "atom2_id": "N4@B.DC230", "atom_pair": "O:N", "residue_pair": "nt:nt" } # x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[60]' { "index": 61, "atom1_serNum": 462, "atom2_serNum": 1187, "donAcc_type": "standard", "distance": 3.692, "atom1_id": "O2@B.DT223", "atom2_id": "NH2@C.ARG102", "atom_pair": "O:N", "residue_pair": "nt:aa" } # x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[100]' { "index": 101, "atom1_serNum": 791, "atom2_serNum": 818, "donAcc_type": "standard", "distance": 2.871, "atom1_id": "N@C.THR26", "atom2_id": "OD2@C.ASP29", "atom_pair": "N:O", "residue_pair": "aa:aa" }
In the above three cases, using SNAP instead of DSSR would give the same results.
Also, one can take advantage of the residue_pair
value to filter H-bonds by type. For example, the following command would extract only H-bonds at the DNA-protein interface (38 occurrences, same as the number noted above):
x3dna-snap -i=1oct.pdb --get-hbond --json | jq '.hbonds[] | select(.residue_pair=="nt:aa")'
Back to the phenix.hbond
tool, the author noted that:
Running phenix.hbond requires atomic model in PDB or mmCIF format with all hydrogen atoms added, as well as ligand restraint files if the model contains unknown to the library items.
While there is no particular reason why this should not work for all bio-macromolecules, currently phenix.hbond is only optimized and tested to work with proteins, which is the limitation that will be removed in future.
In contrast, the H-bond identification algorithm in DSSR/SNAP does not require hydrogen atoms. In fact, hydrogen atoms are simply ignored if they exist. As shown above, the H-bond method as implemented in DSSR/SNAP works for DNA, RNA, protein, or their complexes. This does not necessarily mean that the 3DNA way is superior to other similar tools. It just works well in my hand, and it may serve as a pragmatic choice for other users.