March 2026
February 2026
Cover image provided by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).
As the developer of DSSR, I am thrilled to see its application in cutting-edge research across multiple disciplines. Below is a list of four recent publications that highlight how DSSR has been utilized, underscoring its versatility and significance in structural bioinformatics.
In the Geng et al. (2025) Nucleic Acids Research (NAR) paper, titled 'Revealing hidden protonated conformational states in RNA dynamic ensembles', DSSR is simply cited as follows:
All bp geometries, hydrogen-bond, backbone, stacking, and sugar dihedral angles were calculated using X3DNA-DSSR [77].
In the preprint by Gordan et al. (2025), titled 'High-throughput characterization of transcription factors that modulate UV damage formation and repair at single-nucleotide resolution', DSSR is cited as follows:
Step base stacking, base pair shift, base pair slide, interbase angle, pseudorotation angle, and sugar puckering classifications of nucleobases were computed using X3DNA-DSSR (v2.5.0)75. Base stacking was defined as the overlapping polygon area in Å2 when projecting the dipyrimidine base ring atoms (excluding exocyclic atoms) into the mean base pair plane76. The sugar ring pseudorotation phase angle of each pyrimidine was also calculated using X3DNA-DSSR as described by Altona, C. & Sundaralingam, M.77 Interbase angle was defined as sqrt(propeller2+buckle2) per the X3DNA-DSSR documentation.
Figure 6: TF Binding Induces Structural Distortion Favorable to UV Dimerization is highly informative, particularly panel (a), which illustrates the ensemble of structural parameters that predispose dipyrimidines to cyclobutane pyrimidine dimers (CPD) or 6-4 pyrimidine-pyrimidones (6-4 PP) formation. DSSR is designed as an integrated software tool, offering a comprehensive suite of structural parameters not found in any other single tool I am aware of. Despite this, the innovative use of DSSR by Gordan et al. exceeds my expectations and demonstrates its versatility.
In the preprint by Kubaney et al. (2025) from the Baker group, titled 'RNA sequence design and protein-DNA specificity prediction with NA-MPNN', DSSR is cited as follows:
On the pseudoknot subset, we evaluate additional structure‐ and reactivity‐based metrics. DSSR v2.3.241 is used to extract the ground‐truth secondary structure from the native crystal structures. For each designed sequence, RibonanzaNet predicts 2A3 reactivity profiles, from which we compute predicted OpenKnot scores (see https://github.com/eternagame/OpenKnotScore)31 using the predicted reactivity together with the DSSR ground truth.
In a recent NSMB paper from the Baker group, titled 'Computational design of sequence-specific DNA-binding proteins', 3DNA is cited as follows:
RIF docking of scaffolds onto DNA targets (DBP design step 1) Structures of B-DNA for each target (Supplementary Table 2) were generated by (1) using the DNA portion of PDB 1BC8 (ref. 60), PDB 1YO5 (ref. 61), PDB 1L3L (ref. 51) or PDB 2O4A (ref. 62) or (2) using the software X3DNA63, followed by a constrained Rosetta relax of the DNA structure.
Please note that 3DNA has been replaced by DSSR. The functionality for constructing B-DNA models, previously provided by 3DNA, is now directly available in DSSR via its fiber and rebuild modules.
In the preprint by Si et al. (2025), titled 'End-to-End Single-Stranded DNA Sequence Design with All-Atom Structure Reconstruction', DSSR is cited as follows:
Since ViennaRNA and NUPACK require secondary structures as input, we used DSSR35 to extract secondary structures from the corresponding ssDNA three-dimensional structures.
The above use cases are merely a sample of how DSSR is utilized in the scientific literature. It is reasonable to state that DSSR has emerged as a de facto standard tool within the field of nucleic acid structural bioinformatics. Overall, DSSR is a mature, robust, and efficient software product that is actively developed and maintained. I am committed to making DSSR synonymous with quality and value. Its unmatched functionality, usability, and support save users significant time and effort compared to alternative solutions.
DSSR is available free of charge for academic users. Additionally, it has been integrated into other high-profile bioinformatics resources, including NAKB, PDB-redo, and N•ESPript.
References
- Geng A, Roy R, Ganser L, Li L, Al-Hashimi HM. Revealing hidden protonated conformational states in RNA dynamic ensembles. Nucleic Acids Research. 2025;53:gkaf1366. https://doi.org/10.1093/nar/gkaf1366.
- Gordan R, Wasserman H, Chi B, Bohm K, Duan M, Sahay H, et al. High-throughput characterization of transcription factors that modulate UV damage formation and repair at single-nucleotide resolution. 2025. https://doi.org/10.21203/rs.3.rs-8197218/v1.
- Kubaney A, Favor A, McHugh L, Mitra R, Pecoraro R, Dauparas J, et al. RNA sequence design and protein–DNA specificity prediction with NA-MPNN. 2025. https://doi.org/10.1101/2025.10.03.679414.
- Glasscock CJ, Pecoraro RJ, McHugh R, Doyle LA, Chen W, Boivin O, et al. Computational design of sequence-specific DNA-binding proteins. Nat Struct Mol Biol. 2025;32:2252–61. https://doi.org/10.1038/s41594-025-01669-4.
- Si Y, Xu Y, Chen L. End-to-end single-stranded DNA sequence design with all-atom structure reconstruction. 2025. https://doi.org/10.64898/2025.12.05.692525.
RNA has three salient structural features (compared to DNA): it contains the ribose (not deoxyribose) sugar, it has the uracil (not thymine) base, and it is normally single (not double)-stranded. The O2′(G)…O2P(U) H-bond stabilized GpU dinucleotide platform may turn out to be the smallest unit with all those RNA hallmarks.

First, it must have the guanosine ribose to have the 2′-hydroxyl group form the O2′(G)…O2P(U) H-bond.
Second, the methyl group in position 5 of thymine would cause steric clash with guanosine, thus disrupting the N2(G)…O4(U) base-base H-bond to form the GpU dinucleotide platform.

Third, a dinucleotide, by definition, is single-standed. The two H-bonds, plus the covalent linkage, makes the GpU platform extremely rigid (see Figure 1 of our 2010 NAR paper).
Moreover, the GpU platform is directional: swapping the two bases while keeping the sugar-phosphate backbone fixed does not allow for a base-base H-bond, thus no UpG dinucleotide platform.

It worth noting that state-of-the-art quantum chemistry calculations have verified the importance of the O2′(G)…O2P(U) H-bond in stabilizing the GpU dinucleotide platform.

The least-squares (LS) fitting procedures presented below make use of well known mathematics. Indeed, the methods are so well known and widely used that it is somewhat difficult to locate the original references. In our previous effort to resolve the discrepancies among nucleic acid conformational analysis programs, we came across a variety of LS fitting procedures. Here we provide a detailed description, with step-by-step examples, of our implementation in 3DNA of two LS fitting algorithms based on a covariance matrix and its eigen-system. This post is the revised version of a note first made available in the “Technical Details” section of earlier 3DNA websites.
LS fitting between standard and experimental bases
Three analysis schemes — CompDNA, Curves/Curves+, and RNA — use LS procedures to fit a standard base with an embedded reference frame to an observed base structure. CompDNA and Curves/Curves+ take advantage of the conventional approach of McLachlan [“Least Squares Fitting of Two Structures.” J. Mol. Biol., 128, 74-79 (1979)], while the RNA program implements a closed-form solution of absolute orientation using unit quaternions first introduced by Horn. The two algorithms are mathematically equivalent for the most general cases, since the unit quaternion can be transformed to the rotation matrix given by McLachlan. The Horn method, however, is more straightforward and generally applicable; it can be applied even when one or both of the structures are perfectly planar, whereas the McLachlan approach fails.
Here we use the ideal adenine geometry derived from the high resolution crystal structures of model nucleosides, nucleotides, and bases. The x-, y-, and z-coordinates of the standard base, taken from the NDB, are listed below in the columns labeled sx, sy, and sz, respectively. s_(average) is the geometric center of the base.
sx sy sz
1 N9 0.213 0.660 1.287
2 C4 0.250 2.016 1.509
3 N3 0.016 2.995 0.619
4 C2 0.142 4.189 1.194
5 N1 0.451 4.493 2.459
6 C6 0.681 3.485 3.329
7 N6 0.990 3.787 4.592
8 C5 0.579 2.170 2.844
9 N7 0.747 0.934 3.454
10 C8 0.520 0.074 2.491
------------------------------------
s_(average): 0.4589 2.4803 2.3778
We similarly describe the coordinates of one of the adenine bases (the fifth nucleotide in the sequence strand) from the high resolution (1.4 Å) self-complementary d(CGCGAATTCGCG) dodecamer duplex determined by Williams and co-workers (PDB id: 355d). The experimental xyz coordinates are listed below in the columns labeled ex, ey, and ez. The geometric center is e_(average). Note that the atomic serial numbers from the PDB (first column) have been rearranged so that the atoms are in the same order as those of the ideal base listed above.
ex ey ez
91 N9 16.461 17.015 14.676
100 C4 15.775 18.188 14.459
99 N3 14.489 18.449 14.756
98 C2 14.171 19.699 14.406
97 N1 14.933 20.644 13.839
95 C6 16.223 20.352 13.555
96 N6 16.984 21.297 12.994
94 C5 16.683 19.056 13.875
93 N7 17.918 18.439 13.718
92 C8 17.734 17.239 14.207
------------------------------------
e_(average):16.1371 19.0378 14.0485
We collect the two sets of xyz coordinates in the 10 × 3 matrices S and E corresponding respectively to the standard and experimental bases. We then construct the 3 × 3 covariance matrix C between S and E using the following formula:
1 1
C = ------- [S' E - --- S' i i' E]
n - 1 n
=
0.2782 0.2139 -0.1601
-1.4028 1.9619 -0.2744
1.0443 0.9712 -0.6610
Here n, the number of atoms in each base, is 10, and i is an n x 1 column vector consisting of only ones. S' and i' are the transpose of matrix S and column vector i, respectively.
From the nine elements of the C matrix, we subsequently generate the 4 × 4 real symmetric matrix M using the expression:
| c11+c22+c33 c23-c32 c31-c13 c12-c21 |
M = | c23-c32 c11-c22-c33 c12+c21 c31+c13 |
| c31-c13 c12+c21 -c11+c22-c33 c23+c32 |
| c12-c21 c31+c13 c23+c32 -c11-c22+c33 |
=
1.5792 -1.2456 1.2044 1.6167
-1.2456 -1.0228 -1.1890 0.8842
1.2044 -1.1890 2.3447 0.6968
1.6167 0.8842 0.6968 -2.9011
The largest eigenvalue of matrix M is 4.0335, and its corresponding unit eigenvector is:
[ q0 q1 q2 q3 ] = [ 0.6135 -0.2878 0.7135 0.1780 ]
The rotation matrix R is deduced from the above eigenvector as below:
| q0q0+q1q1-q2q2-q3q3 2(q1q2-q0q3) 2(q1q3+q0q2) |
R = | 2(q2q1+q0q3) q0q0-q1q1+q2q2-q3q3 2(q2q3-q0q1) |
| 2(q3q1-q0q2) 2(q3q2+q0q1) q0q0-q1q1-q2q2+q3q3 |
=
-0.0817 -0.6291 0.7730
-0.1923 0.7710 0.6072
-0.9779 -0.0990 -0.1839
Following coordinate transformation with matrix R, the origin of the standard base is found to be displaced from the experimental structure by:
o = e_(average) - s_(average) R' = [15.8969 15.7701 15.1802]
The least-squares fitted coordinates (F) of the standard base atoms on the experimental structure are then given by:
F = S R' + i o
=
16.4592 17.0194 14.6699
15.7747 18.1925 14.4586
14.4899 18.4519 14.7542
14.1729 19.6974 14.4070
14.9343 20.6404 13.8420
16.2222 20.3472 13.5569
16.9832 21.2875 12.9925
16.6829 19.0585 13.8760
17.9183 18.4437 13.7219
17.7335 17.2396 14.2062
Here S is the (n x 3) matrix of original coordinates of the standard base, and as noted above, i is an n x 1 column vector consisting of only ones.
The difference matrix (D) between F and E, the (n x 3) matrix of original coordinates of the experimental base, and the root-mean-square (RMS) deviation between the two structures are found as:
D = E - F
=
0.0018 -0.0044 0.0061
0.0003 -0.0045 0.0004
-0.0009 -0.0029 0.0018
-0.0019 0.0016 -0.0010
-0.0013 0.0036 -0.0030
0.0008 0.0048 -0.0019
0.0008 0.0095 0.0015
0.0001 -0.0025 -0.0010
-0.0003 -0.0047 -0.0039
0.0005 -0.0006 0.0008
RMS deviation = 0.0054
It should be noted that if the standard base is already defined in terms of its reference frame, as in 3DNA (e.g., $X3DNA/config/Atomic_A.pdb), the vector o and the matrix R represent the best-fitted coordinate frame of the experimental base. Moreover, the three axes of the frame given by R are guaranteed to be orthonormal. If you want to get an insight of the LS fitting algorithm and a better understanding of how 3DNA derives its base reference frame, it’d be a valuable experience to repeat the above procedure with $X3DNA/config/Atomic_A.pdb.
Note: the algorithm does not apply to a molecule vs its inversion (an improper rotation) — thanks to Boris Averkiev for reporting this subtle point (see comments below). One possible remedy is to treat this edge case separately.
Base normal
Rather than fit a standard base to experimental coordinates, the CEHS, FREEHELIX, and NUPARM analyses perform a fitting of a LS plane to a set of atoms in order to define the base and base-pair normals. The covariance matrix based on the n x 3 matrix of experimental Cartesian coordinates E is diagonalized to find the vector normal to the best plane. Specifically, C is obtained using the above formula with S substituted by E. The normal vector then lies along the eigenvector that corresponds to the smallest eigenvalue. Note that the coefficient 1/(n-1) in the formula for calculating C has no effect on the direction of the eigenvectors but scales the magnitudes of the eigenvalues.
Using the above adenine base from the high resolution dodecamer duplex as an example, the covariance matrix C is:
C =
1.6680 -0.5015 -0.3253
-0.5015 2.0670 -0.5840
-0.3253 -0.5840 0.3061
The smallest eigenvalue of C, 8.26e-5, indicates that the base is almost perfectly planar. The corresponding unit eigenvector corresponding to the base normal is:
Base normal: 0.2737 0.3224 0.9062
Related topics:

Over the years, the fiber utility program has become a handy way to generate standard B-DNA and A-DNA structures, as evident from citations to 3DNA. Nevertheless, the currently collected 55 experimental fiber models, comprehensive as they are, do not include one for canonical double-stranded (ds) RNA or single-stranded (ss) RNA structures of generic A/C/G/U sequence.
This situation is best illustrated by a recent article by Charles Brooks and Hashim Al-Hashimi and their co-workers, titled Unraveling the structural complexity in a single-stranded RNA tail: implications for efficient ligand binding in the prequeuosine riboswitch [Nucleic Acids Research, 40(3) 1345–1355 (2012)] , where they wrote:
Idealized A-form structures were constructed using Insight II (Molecular Simulations, Inc.) correcting the propeller twist angles from +15° to –15° using an in-house program, as previously described (47). The complementary strand was removed and the resulting ssRNA used in NMR data analysis. B-form helices were constructed using W3DNA (48).
As of 3DNA v2.1, however, that’s no longer the case: now the fiber utility provides direct support for generating idealized dsRNA or ssRNA structures of arbitrary A/C/G/U sequence. As always, the new functionality can be best illustrated with examples. Let’s build ssRNAs of the wild-type (5’-AUAAAAAACUAA-3’) and A29C mutated form (5’-AUAACAAACUAA-3’) used in the work cited above:
fiber -r -s -seq=AUAAAAAACUAA wt-12nt.pdb
fiber -r -s -seq=AUAACAAACUAA mt-12nt.pdb
Here the -r option is for RNA, -s for a ss structure, and -seq for the specific base sequence. The generated ssRNA structure for the wild-type sequence is named wt-12nt.pdb, and that for the mutated sequence named mt-12nt.pdb.
Note that the new RNA model is based on Struther Arnott’s work of fiber A-DNA from calf thymus (#1 in the list). The dsRNA, as its dsDNA counterpart, has a helical twist of 32.7° and a helical rise of 2.548 Å. Relevant to the above citation, here the propeller twist angle of each base pair is –10.5°, a negative value similar to that observed in high-resolution x-ray crystal structures. Furthermore, you can easily verify the three numbers with the following commands:
fiber -r -seq=AUAAAAAACUAA wt-12nt.pdb
find_pair wt-12nt.pdb stdout | analyze stdin
In summary, it is very easy to generate canonical RNA structures with the revised fiber command. Through its integrated analysis routine, 3DNA can also be used to check structural features of the resultant RNA models. Moreover, as mentioned in the opening post What can 3DNA do for RNA structures? on the forum, 3DNA has much to offer in the filed of RNA structural bioinformatics.

At the C2B2 party this afternoon, I was asked the question: “Does 3DNA work for RNA?” Well, a good question, indeed. The short answer is definitely, YES. However, a detailed explanation is needed to address the underlying intuitive assumption: 3DNA is only for DNA.
- The name 3DNA was due to Dr. Olson, after we struggled quite a while. Initially, we played with NuStar (which was actually cited once by Richard Dickerson et al.), and Carnival etc. I still remember the day when Dr. Olson asked me “How about 3DNA?” We immediately reached an agreement: that’s it — what a cute name! Another advantage (as it becomes clear later): since 3DNA starts with ‘3’, it (mostly) shows up right at the top of many on-line lists of bioinformatics tools.
- Interpreted literally, 3DNA could mean 3-DNA, i.e., the three most common types of DNA: A-, B- and Z-form. That may be one of the reasons where the misconception that 3DNA is only for 3DNA comes from. Another reason could be that structural work on DNA is what the Olson lab best known for.
- The number ‘3’ in 3DNA should also be associated with its three key components: analysis, rebuilding and visualization. In a sense, this is my favorite.
- Of course, 3DNA stands for 3D-NA, 3-Dimensional Nucleic Acids, as expressed explicitly in the titles of our two 3DNA papers (2003 NAR and 2008 NP).
The applications of 3DNA to RNA structures can be broadly categorized as follows:
- Automatically detect all existing base-pairs, Watson-Crick (A-U, G-C, wobble G-U) or non-canonical, using a set of simple geometric criteria. Furthermore, it has a unique base-pair classification system based on the six numerical structural parameters, suitable for database storage and search.
- Automatically detect all triplets or higher-order base-associations.
- Automatically detect double helical regions, regardless of backbone connection, thus ideal for finding pseudo-continuous coaxial stacking.
- The above three features are seamlessly integrated with the visualization component to allow for easy generation of publication quality images. See the 3DNA 2008 NP paper for detailed examples.
As further examples, the following two RNA publications take advantage of find_pair from 3DNA:
It is well worth noting that the base-pair detecting algorithm in RNAView is based on an earlier version of find_pair, a basic fact ignored in the RNAView publication.
In summary, 3DNA works for RNA as well as for DNA, and more.

Recently, I (together with Drs. Wilma Olson and Harmen Bussemaker – a team with a unique combination of complementary expertise) published a new article in Nucleic Acids Research (NAR): The RNA backbone plays a crucial role in mediating the intrinsic stability of the GpU dinucleotide platform and the GpUpA/GpA mini duplex. The key findings of this work are summarized in the abstract:
The side-by-side interactions of nucleobases contribute to the organization of RNA, forming the planar building blocks of helices and mediating chain folding. Dinucleotide platforms, formed by side-by-side pairing of adjacent bases, frequently anchor helices against loops. Surprisingly, GpU steps account for over half of the dinucleotide platforms observed in RNA-containing structures. Why GpU should stand out from other dinucleotides in this respect is not clear from the single well-characterized H-bond found between the guanine N2 and the uracil O4 groups. Here, we describe how an RNA-specific H-bond between O2’(G) and O2P(U) adds to the stability of the GpU platform. Moreover, we show how this pair of oxygen atoms forms an out-of-plane backbone ‘edge’ that is specifically recognized by a non-adjacent guanine in over 90% of the cases, leading to the formation of an asymmetric miniduplex consisting of ‘complementary’ GpUpA and GpA subunits. Together, these five nucleotides constitute the conserved core of the well-known loop-E motif. The backbone-mediated intrinsic stabilities of the GpU dinucleotide platform and the GpUpA/GpA miniduplex plausibly underlie observed evolutionary constraints on base identity. We propose that they may also provide a reason for the extreme conservation of GpU observed at most 5’-splice sites.
As a nice surprise, this publication was selected by NAR as a featured article! According to the NAR website:
Featured Articles highlight the best papers published in NAR. These articles are chosen by the Executive Editors on the recommendation of Editorial Board Members and Referees. They represent the top 5% of papers in terms of originality, significance and scientific excellence.
I feel very gratified with the “extra” recognition. From my own perspective, I can easily rank this paper as the top one in my publication list: from the very beginning, I has been struck by the simplicity and elegance of the GpU story. Hopefully, time will verify the validity of this scientific contribution.
Behind the hood, though, there is a long, complex (sometimes perplexing), yet interesting story associated with this work. Here is how it got started. While writing the 3DNA 2008 Nature Protocols (NP) paper, I selected the (previously undocumented) ‘-p’ option of find_pair to showcase its capability to identify higher-order base associations, using the large ribosomal subunit (1jj2) as an example. I noticed the unexpected O2’(G)⋅⋅⋅O2P H-bond within the GpU dinucleotide platform in a pentaplet (Figure A below). I was/am well aware of Leontis-Westholf’s pioneering work on Geometric nomenclature and classification of RNA base pairs which involves three distinct edges – the Watson-Crick edge, the Hoogsteen edge, and the Sugar edge, yet without taking into consideration of possible sugar-phosphate backbone interactions (Figure B below). So I decided to double-check, just to be sure that the H-bond was not spurious due to defects in the H-bond detecting scheme of find_pair, and the finding was very surprising.

The following section was re-added into the 3DNA NP paper in the very last revision:
It is also worth noting that the G1971–U1972 platform is stabilized not only by the well-characterized G(N2)⋅⋅⋅U(O4) H-bond interaction, but also by a little-noticed G(O2’)⋅⋅⋅U(O2P) sugar-phosphate backbone interaction (Fig. 6a). Examination of the 50S large ribosomal unit (1JJ2) alone reveals ten such double H-bonded G–U platforms, far more occurrences than those registered by any other dinucleotide platform (including A–A) in this structure. Apparently, the G–U platform is more stable than other platforms with only a single base–base H-bond interaction. We are currently investigating this overrepresented G–U dinucleotide platform in other RNA structures. (p.1226)
See also Is the O2’(G)…O2P H-bond in GpU platforms real?

Structural analysis of nucleic acids used to be a rather tedious process, especially for irregular, complicated RNA structures and nucleic-acid/protein complexes [e.g., the large ribosomal subunit of H. marismortui (1jj2)]. Without valid base-pairing information arranged properly in a duplex fragment as input, analysis programs such as Curves+ and analyze/cehs in 3DNA would produce meaningless results. The program find_pair in 3DNA was originally created to solve this specific problem, i.e., to generate an input file to 3DNA analysis routines directly from a nucleic-acid containing structure in PDB format. It is what makes nucleic acids structural analysis a routine process — running through thousands of structures from NDB/PDB can be fully automated.
Overall, find_pair has more than fulfilled the goal of its initial design (as stated above). Over the past few years, its functionality has been expanded and continuously refined (kaizen 改善), making find_pair itself a full-featured application. Now, it is efficient, robust, and its simple command line interface allows for easy integration with other bioinformatics tools. Properly acknowledged or otherwise, find_pair has served (at least) as one of the key components in many other applications (RNAView, BPS, SwS, ARTS, to name just a few). Indeed, find_pair is by far the single program in 3DNA that has received the most questions (as evident from the 3DNA forum).
While I still have to write a method paper to describe the underlying algorithms of find_pair in detail — i.e., for identifying nucleotides, H-bonds, base pairs, high-order base associations, and double helical regions — the basic idea is intuitive and very easy to understand: as summarized in our recent GpU paper”, find_pair is purely geometric based (with user adjustable parameters) and allows for the identification of canonical Watson–Crick as well as non-canonical base pairs, made up of normal or modified bases, regardless of tautomeric or protonation state. For example, in the GpU paper”, we chose the following set of stringent parameters to ensure that the geometry of each identified base pair is nearly planar and supports at least one inter-base H-bond: (i) a vertical distance (stagger) between base planes ≤ 1.5 Å; (ii) an angle between base normal vectors ≤ 30°; and (iii) a pair of nitrogen and/or oxygen base atoms at a distance ≤ 3.3 Å. Other criteria (documented or otherwise), such as the distance between the origins of the two standard base reference frames, are just filters to speed up the calculations.
In a nutshell, find_pair has the following two core functionalities:
- The default is to generate input to the analysis routines in 3DNA (
analyze/cehs) for double helices. However, there are many more job to perform under the hood than just identifying base pairs: the base pairs must be in proper sequential order, and each strand must be in 5’ to 3’ direction, for the calculated step parameters (twist, roll etc) to make sense. Moreover, with the “-c” option, one gets an input file to Curves (but not Curves+, yet); with the “-s” or “-1” option, find_pair treats the whole structure as one single strand, and is useful for getting all backbone torsion angles.
- Detect all base pairs (regardless of double helical regions) and higher-oder (3+) base associations with the “-p” option. This feature (in its preliminary form) was there starting from at least v1.5, which was released at the end of 2002 (just before I left Rutgers), but it was intentionally undocumented. The source code of
find_pair (as part of 3DNA) was tested and shared within Rutgers (NDB and Dr. Olson’s laboratory) before any 3DNA paper was published, and served as the basis for several other projects. We also offered 3DNA (with source code) to a few RNA experts for comments; but we received either no responses or politely-worded negative ones. Things did not work out as (what I thought) they should have been, but that’s life and I have learned my lessons. The “-p” option was first explicitly mentioned in the 3DNA 2008 Nature Protocols paper, to illustrate how to identify the two pentaplets in the large ribosomal subunit of H. marismortui (1jj2).
It is interesting to mention the two papers I’ve recently come across: the first is on DNA-protein interactions and the second on RNA base-pairing, where new algorithms were developed to detect base pairs and their performances were compared with find_pair. In each of the two cases, it was claimed that find_pair missed certain pairs where the new methods succeeded. As it turned out, however, in the first case, simply relaxing find_pair’s default H-bond distance cut-off 4.0 Å to 4.5 Å, as used by the authors, virtually all the missing pairs were recovered. In the second case, the “-p” option, which should have been, was simply not specified.
After nearly a decade of extensive real-world applications and refinements, it is safe to say that find_pair is now a versatile and practical tool for nucleic acids structure analysis. Of course, I will continue to support and further refine find_pair as I see fit. Once in a while, I just cannot stop but to think that find_pair is to nucleic acids what DSSP is to proteins: simple and elegant. As more people become aware of its existence, I would expect find_pair to gain even more widespread usage, especially in RNA-structure related research areas.

While browsing Nucleic Acids Research recently, I noticed the paper titled Conformational analysis of nucleic acids revisited: Curves+ by Dr. Lavery et al. I read it through carefully during the weekend and played around with the software. Overall, I was fairly impressed, and also happy to see that “It [Curves+] adopts the generally accepted reference frame for nucleic acid bases and no longer shows any significant difference with analysis programs such as 3DNA for intra- or inter-base pair parameters.”
Anyone who has ever worked on nucleic acid structures (especially DNA) should be familiar with Curves, an analysis program that has been widely used over the past twenty years. Only in recent years has 3DNA become popular. By and large, though, it is my opinion that 3DNA and Curves are constructive competitors in nucleic acid structure analysis with complementary functionality. As I put it six years ago, before the 13th Conversation at Albany: “Curves has special features that 3DNA does not want to repeat/compete (e.g. global parameters, groove dimension parameters). Nevertheless, we provide an option in a 3DNA utility program (find_pair) to generate input to Curves directly from a PDB data file” on June 6, 2003, and emphasized again on June 09, 2003: “We also see Curves unique in defining global parameters, bending analysis and groove dimensions.” 3DNA’s real strength, as demonstrated in our 2008 Nature Protocols paper, lies in its integrated approach that combines nucleic acid structure analysis, rebuilding, and visualization into a single software package (see image below).

Now the nucleic acid structure community is blessed with the new Curves+, which “is algorithmically simpler and computationally much faster than the earlier Curves approach”, yet still provides its ‘hallmark’ curvilinear axis and “a full analysis of groove widths and depths”. When I read the text, I especially liked the INTRODUCTION section, which provides a nice summary of relevant background information on nucleic acid conformational analysis. An important feature of Curves+ is its integration of the analysis of molecular dynamics trajectories. In contrast, 3DNA lacks direct support in this area (even though I know of such applications from questions posted on the 3DNA forum), mostly due to the fact that I am not an ‘energetic’ person. Of special note is a policy-related advantage Curves+ has over 3DNA: Curves+ is distributed freely, and with source code available. On the other hand, due to Rutgers’ license constraints and various other (undocumented) reasons, 3DNA users are still having difficulty in accessing 3DNA v2.0 I compiled several months ago!
It is worth noting that the major differences in slide (+0.47 Å) and x-displacement (+0.77 Å) in Curves+ vs the old Curves (~0.5 Å and ~0.8 Å, respectively) are nearly exactly those uncovered a decade ago in Resolving the discrepancies among nucleic acid conformational analyses [Lu and Olson (1999), J. Mol. Biol., 285(4), 1563-75]:
Except for Curves, which defines the local frame in terms of the canonical B-DNA fiber structure (Leslie et al., 1980), the base origins are roughly coincident in the different schemes, but are significantly displaced (~0.8 Å along the positive x-axis) from the Curves reference. As illustrated below, this offset gives rise to systematic discrepancies of ~0.5 Å in slide and ~0.8 Å in global x-displacement in Curves compared with other programs, and also contributes to differences in rise at kinked steps. (p. 1566)
Please note that Curves+ has introduced new name list variables — most notably, lib= — and other subtle format changes, thus rendering the find_pair generated input files (with option ‘-c’) no longer valid. However, it would be easy to manually edit the input file to make it work for Curves+, since the most significant part — i.e., specifying paired nucleotides — does not change. Given time and upon user request, however, I would consider to write a new script to automate the process.
Overall, it is to the user community’s advantage to have both 3DNA and Curves+ or a choice between the two programs, and I am more than willing to build a bridge between them to make users’ lives easier.
