Announcing wDSSR: The Next-Generation Web Interface to X3DNA-DSSR
Dear 3DNA/DSSR Community,
We are thrilled to announce the official launch of wDSSR (https://web.x3dna-dssr.org/), the powerful new web interface to the X3DNA-DSSR analytical engine.
Developed by Drs. Shuxiang Li and Xiang-Jun Lu and supported by NIH grant R24GM153869, wDSSR represents a major leap forward from our highly popular 2019 Web 3DNA 2.0 framework. While Web 3DNA 2.0 has faithfully served the community for the analysis, visualization, and modeling of 3D nucleic acid structures, wDSSR was built from the ground up to take full advantage of modern web technologies and the latest DSSR backend capabilities.
A Modern, Streamlined Scientific Workflow
We have completely overhauled the user interface to provide a clean, intuitive, and task-driven experience. The core modeling and analysis tools are now seamlessly organized into a logical, single-word scientific workflow: Analyze, Rebuild, Model, Circularize, Mutate, Assemble, and Visualize.
Spotlight Feature: The "Assemble" Module
One of the most exciting upgrades is the newly renamed Assemble tab (formerly "Composite"). This advanced composite model builder allows you to effortlessly construct complex, higher-order models by linking any combination of nucleic acid duplexes or protein-DNA/RNA complexes. You can quickly connect up to six distinct target structures, ranging from simple linked A-DNA and B-DNA duplexes to large, protein-decorated structural assemblies.
Immediate Global Adoption
Although wDSSR has just launched, we are incredibly humbled to share that it is already seeing rapid worldwide adoption! According to recent network infrastructure data, the new interface is actively being used by researchers across North America, South America, Europe, and Asia. Within just a few days, we have recorded active sessions from prestigious institutions around the globe, including:
- The Weizmann Institute of Science in Israel
- Katholieke Universiteit Leuven in Belgium
- Queen's University in Canada
- Universidad Nacional Autonoma de Mexico (UNAM) in Mexico
- Emory University and the Wadsworth Centers Laboratories and Research in the United States
- Jawaharlal Nehru University and the China Education and Research Network in Asia
How to Cite
While a dedicated paper for wDSSR is currently in preparation, researchers should cite the server using its URL (https://web.x3dna-dssr.org/) alongside the 2019 Web 3DNA 2.0 paper and the foundational 2015 DSSR paper. Full details and funding acknowledgements can be found on our newly consolidated About page.
We invite you all to try out the new wDSSR platform! As always, your feedback is invaluable to us, and we encourage you to share your thoughts, questions, and structural models via the newly updated Questions & Feedback link in the wDSSR footer.
Happy modeling!
With the foundation laid by the previous two posts on Fitting of base reference frame and Automatic identification of nucleotides, we can now get into the details on how the ‘simple’ base-pair (bp) parameters are derived. To make the point clear, I am using two concrete examples from the yeast phenylalanine tRNA (PDB id: 1ehz): the first pair is 2MG10+G45, of type M+N (shortened to g+G) in 3DNA/DSSR; and the second example is a Watson-Crick pair U6–A67, of type M–N (shortened to U–A).
Pair 2MG10+G45 (g+G, of type M+N, see Fig. 1)
Base reference frames

Fig. 1: Base pair 2MG10+G45 (g+G) of type M+N in yeast phenylalanine tRNA 1ehz
In the original coordinate system (as in 1ehz.pdb downloaded from the RCSB PDB), the base-reference frames for 2MG10 and G45 are:
# base reference frame of 2MG10
{
"rsmd": 0.018218,
"origin": [65.696016, 45.134944, 18.125044], # o1
"x_axis": [0.690346, 0.713907, -0.117302], # x1
"y_axis": [-0.706849, 0.700116, 0.101003], # y1
"z_axis": [0.154232, 0.013188, 0.987947] # z1
}
# base reference frame of G45
{
"rsmd": 0.025865,
"origin": [70.584399, 50.526567, 17.229626], # o2
"x_axis": [0.818521, 0.49914, -0.284399], # x2
"y_axis": [-0.574112, 0.728382, -0.373973], # y2
"z_axis": [0.020486, 0.469381, 0.882758] # z2
}
The base-pair reference frame
Since dot(z1, z2) = 0.88 (positive), this pair is of type M+N in 3DNA/DSSR. The ‘mean’ z-axis of the pair is the average of z1 and z2, which is z = [0.090069, 0.248769, 0.964366] (normalized). This is the z-axis of the bp frame, as in 3DNA/DSSR.
The ‘long’ axis employs RC8 (purines) and YC6 (pyrimidines) base atoms. Here 2MG10 and G45 are all purines, so the following two C8 atoms are used:
# C8 atoms of 2MG10 and G45 in 1ehz
HETATM 208 C8 2MG A 10 62.199 48.621 18.635 1.00 40.38 C
ATOM 987 C8 G A 45 67.772 54.149 15.386 1.00 40.45 C
The vector from C8 of G45 to C8 of 2MG10 is:
y0 = [62.199 48.621 18.635] - [67.772 54.149 15.386]
= [-5.573 -5.528 3.249]
Normally, y0 and z-axis are not orthogonal. Here they have an angle of ~81º. The orthogonal component of y0 with reference to the z-axis, when normalized, is the y-axis:
y = [-0.676751, -0.695120, 0.242520]
The x-axis is defined by the right-handed rule:
x = [-0.730682, 0.674479, -0.105746]
Overall, the orthonormal x-, y- and z-axes of the pair defined thus far are:
x = [-0.730682, 0.674479, -0.105746]
y = [-0.676751, -0.695120, 0.242520]
z = [0.090069, 0.248769, 0.964366]
Derivation of the six ‘simple’ base-pair parameters (Fig. 2)

Fig. 2: Schematic diagram of six rigid-body base-pair parameters
Propeller is the ‘torsion’ angle of z2 to z1 with reference to the y-axis, and is calculated using the method detailed in the blog post How to calculate torsion angle?. Here Propeller is: -24.24º. Similarly, Buckle is defined as the ‘torsion’ angle of z2 to z1 with reference to the x-axis, and is -14.81º. Opening is defined as the angle from y2 to y1 with reference to the z-axis, and is: 13.32º.
The corresponding translational parameters are simply projects of the o2 to o1 vector onto the x-, y- and z-axis, respectively. Here, they have values:
d = o1 - o2 = [-4.888383, -5.391623, 0.895418]
Shear = dot(d, x) = -0.16
Stretch = dot(d, y) = 7.27
Stagger = dot(d, z) = -0.92
‘Corrections’ of Buckle and Propeller
Base-pair non-planarity is due to the following three parameters: Buckle, Propeller, and Stagger. In particular, Buckle and Propeller cause the two bases to be non-parallel, the most noticeable characteristic of a pair. These two angular parameters are well-documented in literature, even among the canonical Watson-Crick base pairs. In 3DNA/DSSR, the angle between the two base normal vectors (in range [0, 90º]) is related to Buckle and Propeller with the formula:
interBase-angle = sqrt(Buckle^2+Propeller^2)
For the 2MG10+G45 pair, the angle between z1 and z2 is 28.18º, and sqrt(Buckle^2+Propeller^2) = 28.405º. So the following ‘corrections’ are made:
Buckle = -14.81 * 28.18 / 28.405 = -14.69
Propeller = -24.24 * 28.18 / 28.405 = -24.05
Overall, the ‘corrections’ have only small influence on the numerical values of the reported Buckle and Propeller parameters. It is ‘sensible’ that the ‘simple’ parameters have the property interBase-angle = sqrt(Buckle^2+Propeller^2), just as the original 3DNA/DSSR bp parameters.
Now, the six ‘simple’ bp parameters for 2MG10+G45, reported in 3DNA analyze program as of v2.3-2016jan01 are:
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
* 1 g+G -0.16 7.27 -0.92 -14.69 -24.05 13.32 28.2
The corresponding local bp parameters as originally reported by 3DNA/DSSR are as follows. Note the significant differences in Shear vs. Stretch, and Buckle vs. Propeller in the two sets of bp parameters. On the other hand, Stagger is identical and Opening should be quite close, by definition. Due to the similarity in Stagger and Opening, DSSR only reports four ‘simple’ parameters (i.e., Shear, Stretch, Buckle, and Propeller).
Local base-pair parameters
bp Shear Stretch Stagger Buckle Propeller Opening
1 g+G -7.21 -0.97 -0.92 25.58 -11.83 13.07
Base-pair U6–A67 (Watson-Crick U–A, of type M–N, see Fig. 3)

Fig. 3: Base pair U6–A67 (U–A) of type M–N in yeast phenylalanine tRNA 1ehz
Base reference frames
In the original coordinate system (as in 1ehz.pdb downloaded from the RCSB PDB), the base-reference frames for U6 and A67 are:
# base reference frame of U6 (white in Fig. 3)
{
"rsmd": 0.010835,
"origin": [60.441988, 48.83479, 41.242523], # o1
"x_axis": [0.28491, 0.503019, 0.815965], # x1
"y_axis": [0.887155, -0.460753, -0.025726], # y1
"z_axis": [0.363018, 0.731217, -0.577529] # z1
}
# base reference frame of A67 (colored yellow in Fig. 3)
{
"rsmd": 0.01992,
"origin": [60.578326, 48.823104, 41.154211], # o2
"x_axis": [0.034097, 0.205538, 0.978055], # x2
"y_axis": [-0.90687, 0.417653, -0.056155], # y2
"z_axis": [-0.420029, -0.885054, 0.200637] # z2
}
The base-pair reference frame
Since dot(z1, z2) = -0.92 (negative), this pair is of type M–N in 3DNA/DSSR. The y- and z-axis are thus reversed (corresponding to a 180º rotation around the x-axis) to align z2 with z1.
# base reference frame of A67, with y- and z-axes reversed
{
"origin": [60.578326, 48.823104, 41.154211], # o2
"x_axis": [0.034097, 0.205538, 0.978055], # x2
"y_axis": [0.90687, -0.417653, 0.056155], # y2 -- reversed
"z_axis": [0.420029, 0.885054, -0.200637] # z2 -- reversed
}
Thereafter, the procedure is similar to the one for the M+N type above. Note here U6 is a pyrimidine, so its C6 atom is used. The final results are:
# C6 atom of U6 and C8 atom A67 in 1ehz
ATOM 132 C6 U A 6 64.926 46.497 41.084 1.00 35.72 C
ATOM 1457 C8 A A 67 56.129 50.866 40.893 1.00 40.04 C
#---------
y0 = [64.926 46.497 41.084] - [56.129 50.866 40.893]
= [8.797 -4.369 0.191]
x = [0.160777, 0.363836, 0.917482]
y = [0.902274, -0.430972, 0.012793]
z = [0.400064, 0.825764, -0.397570]
The six ‘simple’ and original base-pair parameters
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
1 U-A 0.06 -0.13 -0.08 -0.59 -23.71 5.39 23.7
# ------------
Local base-pair parameters
bp Shear Stretch Stagger Buckle Propeller Opening
1 U-A 0.06 -0.13 -0.08 -0.63 -23.71 5.50
As can be seen, for Watson-Crick pairs, the ‘simple’ and the original bp parameters are very similar.
Special notes on the ‘simple’ base-pair parameters

- For the most common Watson-Crick pairs, the newly introduced ‘simple’ bp parameters match those of the original 3DNA/DSSR parameters very well (as shown by the U6–A67 pair). For non-canonical pairs, significant differences in Shear, Stretch, Buckle and Propeller are expected (as illustrated by the 2MG10+G45 pair). The differences come from the divergent definitions of the bp reference frame, which is distinct for each type of non-canonical pairs.
- Only the original 3DNA/DSSR six bp parameters can be used for exact reconstruction (with the 3DNA
rebuild program) of the corresponding bp geometry. The ‘simple’ bp parameters are for description only, and they could be more intuitive than the original 3DNA/DSSR counterparts. They complement, buy by no means replace, the classic “local” bp parameters. The term ‘simple’ is used to distinguish the new from the original closely related, yet quite different bp parameters.
- As details for the 2MG10+G45 pair, several ad hoc decisions are made in deriving the ‘simple’ bp parameters. For example, instead of using RC8–YC6 to define the y-axis, one can also use RN9–YN1 (as did by Richardson). Each such choice will lead (slightly) different numerical values, depending on the type of the non-canonical pairs. In some cases, Buckle and Propeller could differ by several degrees. Since RC8 and YC6 atoms lie near the ‘center’ of purines and pyrimidines, they are used to define the y-axis (by default). DSSR has provisions of selecting RN9–YN1, as well as a couple of other choices, for the definition of the y-axis.
- When the M+N pair is counted as N+M, Shear, Stretch, Buckle, and Propeller remain the same, but Stagger and Opening reverse their signs. For example, here are the results of 2MG10+G45 vs. G45+2MG10:
# 2MG10+G45
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
* 1 G+g -0.16 7.27 0.92 -14.69 -24.05 -13.32 28.2
# Reverse the order: treated as G45+2MG10
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
* 1 g+G -0.16 7.27 -0.92 -14.69 -24.05 13.32 28.2
- When the M–N pair is counted as N–M, Stretch, Stagger, Propeller, and Opening remain the same, but Shear and Buckle reverse their signs. For example, here are the results of U6–A67 vs. A67–U6:
# U6–A67
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
1 U-A 0.06 -0.13 -0.08 -0.59 -23.71 5.39 23.7
# Reverse the order: treated as A67–U6
Simple base-pair parameters based on YC6-RC8 vectors
bp Shear Stretch Stagger Buckle Propeller Opening angle
1 A-U -0.06 -0.13 -0.08 0.59 -23.71 5.39 23.7
Related posts

Once a nucleotide (nt) is identified, and matched to A (C, G, T, U) for the standard case or a (c, g, t, u) for a modified one, 3DNA/DSSR performs a least-squares fitting procedure to locate the base reference frame in three-dimensional space. The basic idea is very simple and widely applicable. The algorithm constitutes one of the key components of 3DNA/DSSR. As always, the details can be most effectively illustrated with a worked example. Using G1 in the yeast phenylalanine tRNA (PDB id: 1ehz) as an example, the atomic coordinates of its nine base-ring atoms are:
# G1, nine base-ring atoms for ls-fitting
ATOM 14 N9 G A 1 51.628 45.992 53.798 1.00 93.67 N
ATOM 15 C8 G A 1 51.064 46.007 52.547 1.00 92.60 C
ATOM 16 N7 G A 1 51.379 44.966 51.831 1.00 91.19 N
ATOM 17 C5 G A 1 52.197 44.218 52.658 1.00 91.47 C
ATOM 18 C6 G A 1 52.848 42.992 52.425 1.00 90.68 C
ATOM 20 N1 G A 1 53.588 42.588 53.534 1.00 90.71 N
ATOM 21 C2 G A 1 53.685 43.282 54.716 1.00 91.21 C
ATOM 23 N3 G A 1 53.077 44.429 54.946 1.00 91.92 N
ATOM 24 C4 G A 1 52.356 44.836 53.879 1.00 92.62 C
The corresponding nine base-ring atoms of G in its standard base reference frame are listed below. See Table 1 of the report A Standard Reference Frame for the Description of Nucleic Acid Base-pair Geometry, and file Atomic_G.pdb distributed with 3DNA ($X3DNA/config/Atomic_G.pdb). In DSSR, the content has been integrated into the source code to make the program self-contained.
# G in standard base reference frame
ATOM 2 N9 G A 1 -1.289 4.551 0.000
ATOM 3 C8 G A 1 0.023 4.962 0.000
ATOM 4 N7 G A 1 0.870 3.969 0.000
ATOM 5 C5 G A 1 0.071 2.833 0.000
ATOM 6 C6 G A 1 0.424 1.460 0.000
ATOM 8 N1 G A 1 -0.700 0.641 0.000
ATOM 9 C2 G A 1 -1.999 1.087 0.000
ATOM 11 N3 G A 1 -2.342 2.364 0.001
ATOM 12 C4 G A 1 -1.265 3.177 0.000
A least-squares fitting of the standard onto the experimental set of base-ring atoms defines the base reference frame (Fig. 1). The information is available via the following commands:
# find_pair -s 1ehz.pdb # in file 'ref_frames.dat'
... 1 G # A:...1_:[..G]G
53.7571 41.8678 52.9303 # origin
-0.2589 -0.2496 -0.9331 # x-axis
-0.5430 0.8365 -0.0731 # y-axis
0.7988 0.4878 -0.3521 # z-axis
# --------
# x3dna-dssr -i=1ehz.pdb --json | jq .nts[0].frame
{
rsmd: 0.008,
origin: [53.757, 41.868, 52.93],
x_axis: [-0.259, -0.25, -0.933],
y_axis: [-0.543, 0.837, -0.073],
z_axis: [0.799, 0.488, -0.352]
}

Fig. 1: G1 in tRNA 1ehz, with base reference frame attached
Please note the following subtle points:
- The standard base (
Atomic_G.pdb) is already set in its reference frame: the _z_-coordinates are virtually zeros, _y_-coordinates are positive, the atoms along the minor-groove edge have negative _x_-coordinates, as can be visualized clearly from the attached coordinate frame. In 3DNA, the five standard standard bases are in stored in files Atomic_[ACGTU].pdb, and the corresponding modified ones are in Atomic_[acgtu].pdb. For simplicity, Atomic_A.pdb and Atomic_a.pdb are the same by default, as are the other four cases.
- The translation and rotation of the least-squares fitting process define the experimental base reference frame (for G1 in the above example), and its three axes are orthonormal by definition.
- By design, the base rings of
Atomic_A.pdb and Atomic_G.pdb match each other closely (see below), as are the pyrimidines bases. The least-square fitted root-mean-square deviation (rmsd) of the nine base-ring atoms between standard A and G is only 0.04 Å. Fitting the standard A (instead of G) onto G1 of 1ehz leads to a base reference frame that is essentially indistinguishable from the one above (see below). This feature shows that any ambiguity in assigning modified purines to A or G, or pyrimidines to C, T, or U causes no notable differences in 3DNA/DSSR results.
Comparison of base-ring atomic coordinates in standard G and A
Atomic_G.pdb Atomic_A.pdb
N9 G -1.289 4.551 0.000 | N9 A -1.291 4.498 0.000
C8 G 0.023 4.962 0.000 | C8 A 0.024 4.897 0.000
N7 G 0.870 3.969 0.000 | N7 A 0.877 3.902 0.000
C5 G 0.071 2.833 0.000 | C5 A 0.071 2.771 0.000
C6 G 0.424 1.460 0.000 | C6 A 0.369 1.398 0.000
N1 G -0.700 0.641 0.000 | N1 A -0.668 0.532 0.000
C2 G -1.999 1.087 0.000 | C2 A -1.912 1.023 0.000
N3 G -2.342 2.364 0.001 | N3 A -2.320 2.290 0.000
C4 G -1.265 3.177 0.000 | C4 A -1.267 3.124 0.000
Comparison of G1 (1ehz) base reference frame derived using standard G or A
Atomic_G.pdb | Atomic_A.pdb
53.7571 41.8678 52.9303 # origin | 53.7286 41.9276 52.9482 # origin
-0.2589 -0.2496 -0.9331 # x-axis | -0.2562 -0.2540 -0.9327 # x-axis
-0.5430 0.8365 -0.0731 # y-axis | -0.5444 0.8352 -0.0780 # y-axis
0.7988 0.4878 -0.3521 # z-axis | 0.7988 0.4878 -0.3522 # z-axis
Related topics:

Any analysis of nucleic acid structures start with the identification of nucleotides (nts), the basic building unit. As per the PDB convention, each nt (like any other ligands) is specified by a three-letter identifier. For example, the four standard RNA nts are ..A, ..C, ..G, and ..U, respectively. The four corresponding standard DNA nts are .DA, .DC, .DG, and .DT, respectively. Note that here, for visualization purpose, each space is represented by a dot (.). In practice, the following codes for the five standard DNA/RNA nts — ADE, CYT, GUA, THY, and URA — are also commonly encountered, among other variants.
On top of the standard nts, there are numerous modified ones, each assigned a unique three-letter code. In the classic yeast phenylalanine tRNA (PDB id: 1ehz), 14 out of the 76 nts are modified, as shown in Fig. 1 below.

Fig. 1: Modified nucleotides in yeast phenylalanine tRNA 1ehz
It is challenging to maintain a comprehensive and updated list of ever-inceasing nts encountered in the PDB and molecular dynamics (MD) simulation packages (e.g., AMBER, GROMACS, and CHARMM). Thus, as of today, some well-known DNA/RNA structural bioinformatics tools can handle only standard nts or a limited list of modified ones.
From early on in the development of 3DNA, I observed that all recognized nts have a core six-membered ring, with atoms named N1,C2,N3,C4,C5,C6 consecutively (see Fig. 2 below). Purines have three additional atoms, named N7,C8,N9. So it is feasible to automatically identify nts, and classify them as pyrimidines and purines, based on the common core skeleton shared by all of them. Moreover, the ‘skeleton’ is not effected by any possible tautomeric or protonation state.

Fig. 2: Identification of nts in 3DNA/DSSR based on atomic names and planar geometry
Early versions of 3DNA employed only three atoms (N1, C2 and C6) and three distances to decide a nt. Purines were further discriminated by the N9 atom, and the N1–N9 distance. While developing DSSR, I revised the nt-identification algorithm by using a least-squares fitting procedure that makes use of all available base ring atoms instead of selected ones. The same new algorithm has also been adapted into the find_pair/analyze etc programs in 3DNA, as of v2.2.
As always, the idea can be best illustrated with a worked example. Guanine in its standard base reference frame, with the following list of nine ring atoms coordinates, is chosen for the least-squares fitting. See file Atomic_G.pdb in the 3DNA distribution, and also Table 1 of the report A Standard Reference Frame for the Description of Nucleic Acid Base-pair Geometry.
ATOM 2 N9 G A 1 -1.289 4.551 0.000
ATOM 3 C8 G A 1 0.023 4.962 0.000
ATOM 4 N7 G A 1 0.870 3.969 0.000
ATOM 5 C5 G A 1 0.071 2.833 0.000
ATOM 6 C6 G A 1 0.424 1.460 0.000
ATOM 8 N1 G A 1 -0.700 0.641 0.000
ATOM 9 C2 G A 1 -1.999 1.087 0.000
ATOM 11 N3 G A 1 -2.342 2.364 0.001
ATOM 12 C4 G A 1 -1.265 3.177 0.000
By using a ls-fitting procedure, only (any) three atoms are needed. We no longer need to make explicit selection, as we did previously (N1,C2,C6 and N9), thus allowing for possible modification on these atoms.
Using four nts (G1, 2MG10, H2U16, and PSU39, see Fig. 1 above top) of 1ehz as examples, the following list gives the atomic coordinates of base ring atoms, and root-mean-squres devisions (rmsd) of the least-squares fit. Of course, when performing least-squares fitting, the names of corresponding atoms must match (note the different ordering of atoms for H2U and PSU in the list vs the above standard G reference).
#G1, rmsd=0.008
ATOM 14 N9 G A 1 51.628 45.992 53.798 1.00 93.67 N
ATOM 15 C8 G A 1 51.064 46.007 52.547 1.00 92.60 C
ATOM 16 N7 G A 1 51.379 44.966 51.831 1.00 91.19 N
ATOM 17 C5 G A 1 52.197 44.218 52.658 1.00 91.47 C
ATOM 18 C6 G A 1 52.848 42.992 52.425 1.00 90.68 C
ATOM 20 N1 G A 1 53.588 42.588 53.534 1.00 90.71 N
ATOM 21 C2 G A 1 53.685 43.282 54.716 1.00 91.21 C
ATOM 23 N3 G A 1 53.077 44.429 54.946 1.00 91.92 N
ATOM 24 C4 G A 1 52.356 44.836 53.879 1.00 92.62 C
#2MG10, rmsd=0.018
HETATM 207 N9 2MG A 10 61.581 47.402 18.752 1.00 42.14 N
HETATM 208 C8 2MG A 10 62.199 48.621 18.635 1.00 40.38 C
HETATM 209 N7 2MG A 10 63.494 48.534 18.422 1.00 40.70 N
HETATM 210 C5 2MG A 10 63.745 47.167 18.395 1.00 43.82 C
HETATM 211 C6 2MG A 10 64.965 46.449 18.205 1.00 43.45 C
HETATM 213 N1 2MG A 10 64.767 45.086 18.293 1.00 44.71 N
HETATM 214 C2 2MG A 10 63.541 44.482 18.486 1.00 47.21 C
HETATM 217 N3 2MG A 10 62.411 45.125 18.614 1.00 45.85 N
HETATM 218 C4 2MG A 10 62.574 46.451 18.582 1.00 43.27 C
#H2U16, rmsd=0.188
HETATM 336 N1 H2U A 16 77.347 53.323 34.582 1.00 91.19 N
HETATM 337 C2 H2U A 16 76.119 52.865 34.160 1.00 92.39 C
HETATM 339 N3 H2U A 16 75.123 52.894 35.107 1.00 93.28 N
HETATM 340 C4 H2U A 16 75.289 52.711 36.458 1.00 93.34 C
HETATM 342 C5 H2U A 16 76.696 52.479 36.909 1.00 93.77 C
HETATM 343 C6 H2U A 16 77.717 53.238 36.039 1.00 93.22 C
#PSU39, rmsd=0.004
HETATM 845 N1 PSU A 39 74.080 36.066 5.459 1.00 75.82 N
HETATM 846 C2 PSU A 39 74.415 36.835 4.354 1.00 75.59 C
HETATM 847 N3 PSU A 39 75.735 36.769 3.984 1.00 76.29 N
HETATM 848 C4 PSU A 39 76.728 36.038 4.591 1.00 77.28 C
HETATM 849 C5 PSU A 39 76.307 35.280 5.732 1.00 77.93 C
HETATM 850 C6 PSU A 39 75.025 35.316 6.112 1.00 76.07 C
As noted in the DSSR paper, the rmsd is normally <0.1 Å since base rings are rigid. To account for experimental error and special non-planar cases, such as H2U in 1ehz, the default rmsd cutoff is set to 0.28 Å by default.
With the above detailed algorithm, DSSR (and the 3DNA find_pair/analyze programs) can automatically identify virtually all ‘recognizable’ nts in the PDB. A survey performed in June 2015 detected 630 different types of modified nucleotides in the PDB.
It is worth noting the following points:
- The choice of standard G instead of A as the reference base has no impact on the results. As a matter of fact, the rmsd between G and A is only 0.04 Å. Note also the generous default cutoff of 0.28 Å.
- The method obviously depends on proper naming of the ring atoms. Specially, the base ring atoms must be named
N1,C2,N3,C4,C5,C6 consecutively, with purines having three additional atoms named N7,C8,N9. Thus, under this scheme, TPP (thiamine diphosphate) would not be recognized as a nt by default, simply because of the extra prime (′) of atoms in the six-membered ring. In nucleic acid structures, the prime symbol is normally associated with atoms of the sugar moiety (e.g., the C5′ atom).

Fig. 3: TPP (thiamine diphosphate) would not be recognized as a nt.
- On the other hand, nt cofactors in an otherwise ‘pure’ protein structure will also be recognized. One example is the two AMP (adenosine monophosphate) ligands in PDB entry 12as. This extra identification of nts does no harm in such cases. As shown in the analysis of the SAM-I riboswitch in the DSSR paper, taking the SAM ligand as a nt in base triplet recognition is a neat feature.
- Once a nucleotide has been identified and classified into purines and pyrimidines, exocyclic atoms can be used for further assignment:
O6 or N2 distinguishes guanine from adenine, N4 separates cytosine from thymine and uracil, and C7 (or C5M, the methyl group) differentiates thymine from uracil. For some modified nts, the distinctions within purines or pyrimidines may not be that obvious. For example, inosine may be taken as a modified guanine or adenine. However, this ambiguity does not pose any significant effect on the calculated base-pair parameters.
- In DSSR and 3DNA, each identified nt is assigned a one-letter shorthand code: the standard
..A, .DA, and ADE (among a few other common variations) is shortened to upper-case A, and similarly for C, G, T, and U. Modified nts, on the other hand, are shortened to their corresponding lower-case symbol. For example, modified guanine such as 2MG and M2G in the yeast phenylalanine tRNA (see Fig. 1 above) is assigned g. So in 3DNA/DSSR output, the upper and lower cases of bases (e.g., nts=3 gCG A.2MG10,A.C25,A.G45) convey special meanings.
Related topics:

As of v2.3-2016jan01, the 3DNA analyze program outputs a list of new ‘simple’ base-pair and step parameters, by default. Shown below is a sample output for PDB entry 1xvk. This echinomycin-(GCGTACGC)2 complex has a single DNA strand as the asymmetric unit. 3DNA needs the the biological unit (1xvk.pdb1) to analyze the duplex (with the -symm option). This structure contains two Hoogsteen base pairs, and has popped up on the 3DNA Forum for the zero or negative Rise values. Note that the ‘simple’ Rise values are all positive; for the middle (#4) TA/TA step, it is now 3.09 Å instead of 0.
# find_pair -symm 1xvk.pdb1 1xvk.bps
# analyze -symm 1xvk.bps
# OR by combing the above two commands:
# find_pair -symm 1xvk.pdb1 | analyze -symm
# The output is in file '1xvk.out'
This structure contains 4 non-Watson-Crick (with leading *) base pair(s)
----------------------------------------------------------------------------
Simple base-pair parameters based on RC8--YC6 vectors
bp Shear Stretch Stagger Buckle Propeller Opening
* 1 G+C -3.07 1.55 -0.35 -6.98 0.29 67.33
2 C-G 0.27 -0.17 0.35 -22.34 3.33 -2.80
3 G-C -0.39 -0.17 0.41 22.91 1.81 -2.73
* 4 T+A -3.29 1.56 0.31 -8.03 1.59 -70.46
* 5 A+T -3.29 1.56 -0.31 -8.03 1.59 70.46
6 C-G 0.39 -0.17 0.41 -22.91 1.81 -2.72
7 G-C -0.27 -0.17 0.35 22.34 3.32 -2.80
* 8 C+G -3.07 1.55 0.35 -6.98 0.30 -67.33
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ave. -1.59 0.69 0.19 -3.75 1.75 -1.38
s.d. 1.72 0.92 0.32 17.57 1.15 52.11
----------------------------------------------------------------------------
Simple base-pair step parameters based on consecutive C1'-C1' vectors
step Shift Slide Rise Tilt Roll Twist
* 1 GC/GC -0.55 0.39 7.41 6.40 -4.22 23.36
2 CG/CG -0.05 0.87 2.44 -0.55 3.94 -0.81
* 3 GT/AC 0.38 0.47 7.23 -8.62 3.75 25.70
* 4 TA/TA -0.00 4.73 3.09 -0.00 7.49 25.67
* 5 AC/GT -0.38 0.47 7.23 8.62 3.75 25.70
6 CG/CG 0.05 0.87 2.44 0.55 3.94 -0.82
* 7 GC/GC 0.55 0.39 7.41 -6.40 -4.22 23.36
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ave. -0.00 1.17 5.32 -0.00 2.06 17.45
s.d. 0.39 1.59 2.50 6.21 4.49 12.52
The simple parameters are ‘intuitive’ for non-Watson-Crick base pairs and associated base-pair steps, where the existing standard-reference-frame-based 3DNA parameters may look weird. Note that these simple parameters are for structural description only, not to be fed into the ‘rebuild’ program. Overall, they complement the rigorous characterization of base-pair geometry, as demonstrated by the original analyze/rebuild pair of programs in 3DNA.
In short, the ‘simple’ base-pair parameters employ the YC6—RC8 vector as the y-axis whereas the ‘simple’ step parameters use consecutive C1’—C1’ vectors. As before, the z-axis is the average of two base normals, taking consideration of the M–N vs M+N base-pair classification. In essence, the ‘simple’ parameters make geometrical sense by introducing an ad hoc base-pair reference frame in each case. More details will be provided in a series of blog posts shortly.
Overall, this new section of ‘simple’ parameters should be taken as experimental. The output can be turned off by specifying the analyze -simple=false command-line option explicitly. As always, I greatly appreciate your feedback.

In DSSR (and find_pair -p from the original 3DNA suite), multiplets is defined as “three or more bases associated in a coplanar geometry via a network of hydrogen-bonding interactions. Multiplets are identified through inter-connected base pairs, filtered by pair-wise stacking interactions and vertical separations to ensure overall coplanarity.”
DSSR detects multiplets automatically, and outputs a corresponding MODEL/ENDMDL delineated PDB file (dssr-multiplets.pdb by default) where each multiplet is laid in the most extended view in terms of base planes. The DSSR Nucleic Acids Research (NAR) paper contains four examples (in supplemental Figures 1, 3, 4, and 7) to illustrate this functionality. Please refer to Reproducing results published in the DSSR-NAR paper on the 3DNA Forum for details.
Recently, I read the article titled InterRNA: a database of base interactions in RNA structures by Appasamy et al. in NAR. In Figure 2 of the paper, the authors showcased a sextuple (hexaplet) identified in the E. coli ribosome (PDB id: 4tpe), along with six base-base H-bonds contained therein.
With interest, I tried to run DSSR on the PDB entry 4tpe. As it turns out, ‘4tpe’ has been merged into 4u27 in mmCIF format. I ran DSSR (v1.4.6-2015dec16) in its default settings on ‘4u27’ and get the following summary of results.
# x3dna-dssr -i=4u27.cif -o=4u27.out
total number of base pairs: 4822
total number of multiplets: 680
total number of helices: 264
total number of stems: 566
total number of isolated WC/wobble pairs: 193
total number of atom-base capping interactions: 615
total number of hairpin loops: 215
total number of bulges: 137
total number of internal loops: 244
total number of junctions: 108
total number of non-loop single-stranded segments: 83
total number of kissing loops: 14
total number of A-minor (type I and II) motifs: 246
total number of ribose zippers: 127
total number of kink turns: 15
Among the 680 DSSR-identified multiplets, two hexaplets (one on chain “AA”, and another on “CA”) match those reported by Appasamy et al., as shown below:
678 nts=6 GUUAAA 1:AA.G404,1:AA.U438,1:AA.U439,1:AA.A496,1:AA.A498,1:AA.A499
679 nts=6 GUUAAA 1:CA.G404,1:CA.U438,1:CA.U439,1:CA.A496,1:CA.A498,1:CA.A499
For illustration, the hexaplet #678 is extracted from dssr-multiplets.pdb to file 4u27-hexaplet.pdb (download the coordinates) and shown below. The figure is generated by DSSR and PyMOL, as detailed in Reproducing results published in the DSSR-NAR paper on the 3DNA Forum.
x3dna-dssr -i=4u27-hexaplet.pdb -o=4u27-hexaplet.pml --hbfile-pymol

DSSR-identified hexaplet GUUAAA in 4u27.
DSSR identifies 6 base pairs in the hexaplet:
# x3dna-dssr -i=4u27-hexaplet.pdb --idstr=short
List of 6 base pairs
nt1 nt2 bp name Saenger LW DSSR
1 G404 A498 G+A -- n/a tSS tm+m
2 G404 A499 G+A -- n/a cWH cW+M
3 U438 A496 U-A rHoogsteen 24-XXIV tWH tW-M
4 U439 A496 U-A -- n/a cH. cM-.
5 U439 A498 U-A WC 20-XX cWW cW-W
6 A496 A498 A+A -- n/a cWH cW+M
It detects a total of 9 H-bonds as shown below. In addition to the 6 base-base H-bonds noted by Appasamy et al., DSSR also finds 3 sugar-base H-bonds (#1, #2, and #4, labeled in green) that obviously play a role in stabilizing the high-order base association.
# x3dna-dssr -i=4u27-hexaplet.pdb --get-hbonds --idstr=short
11 59 #1 o 3.017 O:N O2'@G404 N3@U439
11 104 #2 o 2.578 O:N O2'@G404 N1@A498
18 125 #3 p 3.089 O:N O6@G404 N6@A499
21 96 #4 o 3.289 N:O N2@G404 O2'@A498
21 106 #5 p 2.797 N:N N2@G404 N3@A498
39 78 #6 p 2.944 N:N N3@U438 N7@A496
61 81 #7 p 3.167 O:N O4@U439 N6@A496
61 103 #8 p 2.662 O:N O4@U439 N6@A498
82 103 #9 p 3.152 N:N N1@A496 N6@A498

Over the years, I have played quite a few computer programming languages. ANSI C has become my top choice for ‘serious’ software projects, due to its small size, efficiency, flexibility, and ubiquitous support. Moreover, C is a mature language, with a rich ecosystem. As it turns out, C has also been consistently rated as one of the most popular computer languages (#1 or #2) over the past thirty years.
Needless to say, ANSI C has its own quirks, and it takes a steep learning curve. However, once you get over the hurdles, the language serves you. I cannot remember when, but it has been a long while that coding in ANSI C is no longer an issue. It is the understanding of scientific questions that takes most of my time, and coding helps greatly in refining my thoughts.
Not surprisingly, ANSI C was chosen as the sole language for DSSR (and SNAP, or 3DNA in general). The ensure the overall quality of the DSSR codebase, I have taken the following steps:
- The whole project is under git.
- The ANSI C source code is compiled with strict GCC options for full compliance to the standard:
-ansi -pedantic -W -Wall -Wextra -Wunused -Wshadow -Werror -O3
- The executable is checked with valgrind for any memory leak:
valgrind --leak-check=full x3dna-dssr -i=1ehz.pdb -o=1ehz.out --quiet
==19624== Memcheck, a memory error detector
==19624== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==19624== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==19624== Command: x3dna-dssr -i=1ehz.pdb -o=1ehz.out --quiet
==19624==
==19624==
==19624== HEAP SUMMARY:
==19624== in use at exit: 0 bytes in 0 blocks
==19624== total heap usage: 52,829 allocs, 52,829 frees, 92,878,578 bytes allocated
==19624==
==19624== All heap blocks were freed -- no leaks are possible
==19624==
==19624== For counts of detected and suppressed errors, rerun with: -v
==19624== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
- Extensive tests (with the simple
diff command) to ensure the program is working as expected.
The above four measures combined allow me to add new features, refactor the code, and fix bugs, without worrying about accidentally breaking existing functionality. Reading literature (including citations to 3DNA/DSSR) and responding to user feedback on the 3DNA Forum keep me continuously improve DSSR. Some of the recent refinements to DSSR came about this way.

Recently, I became aware of the metallo-base pairs, such as T-Hg-T (PDB id: 4l24) and C-Ag-C (5ay2) from the work of Kondo et al (Pubmed: 24478025 and 26448329). As of v1.4.3-2015oct23, DSSR can detect such metallo-bps automatically, as shown below:
# x3dna-dssr -i=4l24.pdb -o=4l24.out
List of 12 base pairs
nt1 nt2 bp name Saenger LW DSSR
1 A.DC1 B.DG24 C-G WC 19-XIX cWW cW-W
2 A.DG2 B.DC23 G-C WC 19-XIX cWW cW-W
3 A.DC3 B.DG22 C-G WC 19-XIX cWW cW-W
4 A.DG4 B.DC21 G-C WC 19-XIX cWW cW-W
5 A.DA5 B.DT20 A-T WC 20-XX cWW cW-W
6 A.DT6 B.DT19 T-T Metal n/a cWW cW-W
7 A.DT7 B.DT18 T-T Metal n/a cWW cW-W
8 A.DT8 B.DA17 T-A WC 20-XX cWW cW-W
9 A.DC9 B.DG16 C-G WC 19-XIX cWW cW-W
10 A.DG10 B.DC15 G-C WC 19-XIX cWW cW-W
11 A.DC11 B.DG14 C-G WC 19-XIX cWW cW-W
12 A.DG12 B.DC13 G-C WC 19-XIX cWW cW-W
and
# x3dna-dssr -i=5ay2.pdb -o=5ay2.out
List of 24 base pairs
nt1 nt2 bp name Saenger LW DSSR
1 A.G1 B.C12 G-C WC 19-XIX cWW cW-W
2 A.G2 B.C11 G-C WC 19-XIX cWW cW-W
3 A.A3 B.U10 A-U WC 20-XX cWW cW-W
4 A.C4 B.C9 C-C Metal n/a cWW cW-W
5 A.U5 B.A8 U-A WC 20-XX cWW cW-W
6 A.CBR6 B.G7 c-G WC 19-XIX cWW cW-W
7 A.G7 B.CBR6 G-c WC 19-XIX cWW cW-W
8 A.A8 B.U5 A-U WC 20-XX cWW cW-W
9 A.C9 B.C4 C-C Metal n/a cWW cW-W
10 A.U10 B.A3 U-A WC 20-XX cWW cW-W
11 A.C11 B.G2 C-G WC 19-XIX cWW cW-W
12 A.C12 B.G1 C-G WC 19-XIX cWW cW-W
13 C.G1 D.C12 G-C WC 19-XIX cWW cW-W
14 C.G2 D.C11 G-C WC 19-XIX cWW cW-W
15 C.A3 D.U10 A-U WC 20-XX cWW cW-W
16 C.C4 D.C9 C-C Metal n/a cWW cW-W
17 C.U5 D.A8 U-A WC 20-XX cWW cW-W
18 C.CBR6 D.G7 c-G WC 19-XIX cWW cW-W
19 C.G7 D.CBR6 G-c WC 19-XIX cWW cW-W
20 C.A8 D.U5 A-U WC 20-XX cWW cW-W
21 C.C9 D.C4 C-C Metal n/a cWW cW-W
22 C.U10 D.A3 U-A WC 20-XX cWW cW-W
23 C.C11 D.G2 C-G WC 19-XIX cWW cW-W
24 C.C12 D.G1 C-G WC 19-XIX cWW cW-W
Note the name “Metal” for the metallo-bps. Moreover, the corresponding entries in the ‘dssr-pairs.pdb’ file also include the metal ions, as shown below:


It is worth noting that in a metallo-bp, the metal ion lies approximately in the bp plane. Moreover, it is in the middle of the two bases, which would otherwise not form a pair in the conventional sense.

From the Jmol mailing list, I noticed Jmol 14.4.0 was released yesterday (October 13, 2015) by Dr. Bob Hanson. Among the development highlights is the following item:
biomolecule annotations including DSSR, RNA3D, EBI sequence domains, and PDB validation data
I am glad to see that DSSR has been integrated into Jmol, one of the most popular molecular graphics visualization programs. To enable easy access to the DSSR functionality from Jmol, I’ve set up two websites with easy-to-remember URLs: http://jmol.x3dna.org and http://jsmol.x3dna.org. They both point to the same jsmol/ folder extracted from jsmol.zip of the Jmol distribution.
In retrospect, I first met Bob at the Workshop on the PDBx/mmCIF Data Exchange Format for Structural Biology held at Rutgers University during October 21-22, 2013. I approached him during a lunch break, asking for a possible collaboration on integrating DSSR into Jmol. The name DSSR may have played a role in convincing Bob, since it matches the well-known DSSP program for proteins. In the end, we were both excited about the project, talked into details after the meeting, and continued our conversation the next morning while I drove him to the airport.
Nothing real happened until early April 2014. Once getting started, however, we moved forward rapidly: it took less then three weeks to get the first functional version ready for the community to play. See Bob’s announcement RNA/DNA Secondary Structure, anyone? in the Jmol mailing list on April 9, 2014. During this process, we communicated extensively via email, up to 30 messages per day, on technical details for better communication between the two programs. The integration works by using Jmol as a front-end, which calls a web-serivce hosted at Columbia University for DSSR analysis. Jmol’s parsing of the DSSR output is facilitated by the dedicated --jmol option.
The above preliminary, yet functional, DSSR-Jmol integration had be in service without infrastructural changes until two months ago. In August 10, 2015, Bob contacted me:
I might make a significant request though. That would be for the server to deliver all this in JSON format. This is really the way to go. It is what people want and it is perfect for Jmol as well.
I’d played around with JSON or SQLite as a structured data exchange format for quite some time, and Bob’s request finally convinced me that JSON is the (better) way to go. And that began another around of intensive collaborative work that has switched the exchange format between DSSR and Jmol from plain text output to JSON. From August 10 to September 22, we had a total of over 170-email exchanges, plus Skype. JSON has really simplified lives of both parties, especially in the long run.
Overall, collaborating with Bob has been truly an enjoyable and rewarding experience. The DSSR-Jmol integration also serves as a concrete example of what can be achieved by two dedicated minds with complementary expertise.

Curves+ and 3DNA are currently the most widely used programs for analyzing nucleic acid structures (predominantly double helices). As noted in my blog post, Curves+ vs 3DNA, these two programs also complement each other in terms of features. It thus makes sense to run both to get a better understanding of the DNA/RNA structures one is interested in.
Indeed, over the past few years, I have seen quite a few articles citing both 3DNA and Curves+. Listed below are three recent examples:
The helical parameters were measured with 3DNA33 and Curves+.34 The local helical parameters are defined with regard to base steps and without regard to a global axis.
Structure analysis. Helix, base and base pair parameters were calculated with 3DNA or curve+ software packages23,24.
The major global difference between the native and mixed backbone structures is that the RNA backbone is compressed or kinked in strands containing the modified linkage (Fig. 3 B and C, by CURVES) (30). … To compare the three RNA structures at a more detailed and local level, we calculated the base pair helical and step parameters for all three structures using the 3DNA software tools (31) (Fig. 4 and Table S2). [In the Results section]
For each snapshot, the structural parameters—including six base pair parameters, six local base pair step parameters, and pseudorotation angles for each nucleotide—were calculated using 3DNA (31). The two terminal base pairs are omitted for the 3DNA analysis, because they unwind frequently in the triple 2′-5′-linked duplex. [In the Materials and Methods section]
Reading through these papers, however, it is not clear to me if the authors took advantage of the find_pair -curves+ option in 3DNA, as detailed in Building a bridge between Curves+ and 3DNA. Hopefully, this post will help draw more attention to this connection between Curves+ and 3DNA.
