DSSR --symmetry/--nmr options and MODEL/ENDMDL ensemble

Over the past couple of weeks, I’ve added two more DSSR options, --symmetry and --nmr, that are closely related to an ensemble of MODEL/ENDMDL-delineated structures in PDB files. However, there exist subtle differences between the two cases, and the usage of the same MODEL/ENDMDL ensemble format can be ambiguous to the uninitiated. This blog post aims to clarify the issues, using concrete examples.

The --symmetry options applies to X-ray crystal structures where an asymmetric unit represents only part of the whole biological assembly. In standard PDB format, the asymmetric unit contains instructions to produce crystallographic symmetry related molecules.. Nevertheless, the biological assembly are also provided by the PDB (or NDB), with coordinate files ending with .pdb1 or such. For example, the PDB entry 2d94 has the single-stranded sequence GGGCGCCC in its asymmetric unit (2d94.pdb). It is the biological assembly in file 2d94.pdb1 that contains the DNA double helix.

x3dna-dssr -i=2d94.pdb # no pairs found
x3dna-dssr -i=2d94.pdb1 # still no pairs found
x3dna-dssr -i=2d94.pdb1 --symm # 8 pairs found
x3dna-dssr -i=2d94.pdb --symm # no pairs found

As shown by the above examples, DSSR by default reads only the first model even given the biological assemble file 2d94.pdb1. It is with --symmetry (abbreviated to --symm) explicitly specified that DSSR takes all models in the input biological assemble file into consideration. The last case also illustrates that DSSR does not generate crystallographic symmetry related molecules. The --symm simply informs DSSR to take all models, which already exist in the input file, into consideration.

On the other hand, the --nmr option is for auto-processing an ensemble of structures solved by solution NMR method (or trajectories of molecular dynamics simulations). The key point here is that each of the MODEL/ENDMDL-delinated structures is independent and thus can be processed separately, even though they are obviously closely related. Using the PDB entry 2n2d as an example, here are some sample usages:

x3dna-dssr -i=2n2d.pdb -o= 2n2d-first.out # only the first structure is processed
x3dna-dssr -i=2n2d.pdb --nmr -o=2n2d-all.out # all 10 structures are processed
x3dna-dssr -i=2n2d.pdb --nmr --json -o=2n2d-all.json # ibid., with output in JSON

Note that the NMR file is named 2n2d.pdb, and it contains 10 structures.

Interesting mixes show up when an X-ray biological assembly with multiple MODEL/ENDMDL entries is analyzed with --nmr, or an NMR entry is handled with --symmetry. Here are two such examples:

x3dna-dssr -i=2d94.pdb1 --nmr -o=temp # models 1 and 2 are handled sepatately
x3dna-dssr -i=2n2d.pdb --symm -o=temp # wrong -- does not make sense!

In summary, the --symmetry option is intended to treat symmetry-related molecules as a whole, as in a biological assembly of X-ray crystal structures. In contrast, the --nmr option aims to automate the analysis of each structure in a MODEL/ENDMDL-delineated ensemble, as in NMR structures or trajectories of MD simulations. The distinction between the two MODEL/ENDMDL usages is most clearly seen via a molecular visualization program: for example, check the figure below for 2d94.pdb1 (left) and 2n2d.pdb (right) when all frames are selected using Jmol.

2d94 (2 models) 2n2d (10 models)
biological assembly of a DNA duplex (2d94) solution structure of a DNA quadruplex (2n2d)




Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu