It gives me great pleasure to announce that the 3DNA/DSSR project is now funded by the NIH R24GM153869 grant, titled "X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids". I am deeply grateful for the opportunity to continue working on a project that has basically defined who I am. It was a tough time during the funding gap over the past few years. Nevertheless, I have experienced and learned a lot, and witnessed miracles enabled by enthusiastic users.
Since late 2020 when I lost my R01 grant, DSSR has been licensed by the Columbia Technology Ventures (CTV). I appreciate the numerous users (including big pharma) who purchased a DSSR Pro License or a DSSR Basic paid License. Thanks to the NIH R24GM153869 grant, we are pleased to provide DSSR Basic free of charge to the academic community. Academic Users may submit a license request for DSSR Basic or DSSR Pro by clicking "Express Licensing" on the CTV landing page. Commercial users may inquire about pricing and licensing terms by emailing techtransfer@columbia.edu, copying xiangjun@x3dna.org.
The current version of DSSR is v2.4.5-2024sep24 which contains miscellaneous bug fixes (e.g., chain id with > 4 chars) and minor improvements. This release synchronizes with the new R24 funding, which will bring the project to the next level. All existing users are encouraged to upgrade their installation.
Lots of exciting things will happen for the project. The first thing is to make DSSR freely accessible to the academic community. In the past couple of weeks, CTV have already issued quite a few DSSR Basic Academic licenses to users from all over the world. So the demand is high, and it will become stronger as more academic users become aware of DSSR. I'm closely monitoring the 3DNA Forum, and is always ready to answer users questions.
I am committed to making DSSR a brand that stands for quality and value. By virtue of its unmatched functionality, usability, and support, DSSR saves users a substantial amount of time and effort when compared to other options. My track record throughout the years has unambiguously demonstrated my dedication to this solid software product.
DSSR Basic contains all features described in the three DSSR-related papers, and includes the originally separate SNAP program (still unpublished) for analyzing DNA/RNA-protein complexes. The Pro version integrates the classic 3DNA functionality, plus advanced modeling routines, with email/Zoom/phone support.
Recently, I read with great interest an article titled A context-sensitive guide to RNA & DNA base-pair & base-stack geometry by Dr. Jane Richardson, published in CCN (Computational Crystallography Newsletter, 2015, 5, 42—49). Highlighted in the article are Buckle and Propeller twist (see bottom left of the figure below), two of the angular parameters that characterize base-pair (bp) non-planarity. Particularly, I was intrigued by the “Notes on measures and figures” at the end:
Base normals were constructed in Mage (Richardson 2001) and twist torsions and buckle angles were measured from them; propeller-twists were measured as dihedral angles around an axis between N1/9 atoms.
The Richardson CCN article prompted me to think more on intuitive description of bp geometry that can be easily grasped by experimentalist, especially X-ray crystallographers or cryo-EM practitioners. Without worrying about model building as with the six rigid-body parameters, it is straightforward to come up with a new set of four ‘simple’ parameters (Shear, Stretch, Buckle and Propeller) with the following characteristics:
- Each parameter can be positive or negative. For type M–N pairs (as in the canonical cases), Shear and Buckle reverse their signs when the two bases are swapped (i.e. counted as N–M instead of M–N). In all other cases, the signs of the parameters remain unchanged. See the DSSR paper for the definition of M+N vs M–N type of pairs.
- Intuitive results for non-canonical pairs, even when Opening is ~180º.
- Consistent definition between Shear/Buckle (x-axis) vs Stretch/Propeller (y-axis).
- As in 3DNA and DSSR, Buckle^2 + Propeller^2 = interBase_angle^2. Either Buckle or Propeller can render the two base planes of a pair non-parallel. Combined together, they introduce a non-zero inter-base angle. By definition, each parameter should not be larger than the overall inter-base angle.
With the cartoon-block representation introduced in DSSR, base-stacking interactions and bp deformations (especially Buckle and Propeller) are immediately obvious. Two example are illustrated in the figure below: one is the classic Dickerson B-DNA dodecamer (355d, DSSR output), and the other is the parallel double-stranded helix of poly(A) RNA (4jrd, DSSR output).
A portion of DSSR output for the B-DNA duplex 355d is shown below. Note that the first bp (at the bottom left in the figure above) has a Propeller of –17º (and a Buckle of +7º). As beautifully explained by Calladine et al. in their book Understanding DNA,
The Molecule & How It Works, Watson-Crick pairs prefer to have negative Propeller in right-handed DNA double helices to improve same-strand base-stacking interactions. The average value of Propeller in A- and B-DNA crystal structures is around –11º (see Table 3 of the Olson et al. standard base reference frame paper).
nt1 nt2 bp name Saenger LW DSSR
1 A.DC1 B.DG24 C-G WC 19-XIX cWW cW-W
[-105.9(anti) ~C2'-endo lambda=53.5] [-141.3(anti) ~C3'-endo lambda=52.7]
d(C1'-C1')=10.71 d(N1-N9)=8.96 d(C6-C8)=9.88 tor(C1'-N1-N9-C1')=-21.4
H-bonds[3]: "O2(carbonyl)-N2(amino)[2.83],N3-N1(imino)[2.90],N4(amino)-O6(carbonyl)[2.98]"
interBase-angle=19 Simple-bpParams: Shear=0.28 Stretch=-0.13 Buckle=7.3 Propeller=-17.2
bp-pars: [0.28 -0.14 0.07 6.93 -17.31 -0.61]
2 A.DG2 B.DC23 G-C WC 19-XIX cWW cW-W
[-85.4(anti) ~C2'-endo lambda=53.4] [-150.3(anti) ~C3'-endo lambda=55.4]
d(C1'-C1')=10.61 d(N1-N9)=8.92 d(C6-C8)=9.83 tor(C1'-N1-N9-C1')=-21.7
H-bonds[3]: "O6(carbonyl)-N4(amino)[2.91],N1(imino)-N3[2.88],N2(amino)-O2(carbonyl)[2.88]"
interBase-angle=17 Simple-bpParams: Shear=-0.24 Stretch=-0.18 Buckle=9.0 Propeller=-14.5
bp-pars: [-0.24 -0.18 0.49 9.34 -14.30 -2.08]
A portion of DSSR output for the parallel A-DNA duplex 4jrd is shown below. Note that the values of ‘simple’ Propeller are positive for both bps #7 and #8. In contrast, the rigid-body bp parameters have their signs flipped over when Opening is switched from –179.56º for bp#7 to +179.23º for bp#8. This sign ‘ambiguity’ around 180º Opening could be confusing. Yet, all the six bp parameters must be kept as they are for rigorous rebuilding, especially within a larger context than a bp per se. From the very beginning, 3DNA has adopted the convention of keeping angular parameters in the range of [–180º, +180º] instead of [0, 360º], allowing left-handed Z-DNA to have negative twist.
7 A.A8 B.A7 A+A -- 02-II tHH tM+M
[-175.8(anti) ~C3'-endo lambda=10.2] [-172.7(anti) ~C3'-endo lambda=12.6]
d(C1'-C1')=11.15 d(N1-N9)=8.29 d(C6-C8)=6.31 tor(C1'-N1-N9-C1')=160.1
H-bonds[4]: "OP2-N6(amino)[2.97],N7-N6(amino)[2.97],N6(amino)-OP2[2.92],N6(amino)-N7[2.91]"
interBase-angle=14 Simple-bpParams: Shear=-7.88 Stretch=0.66 Buckle=-7.8 Propeller=11.9
bp-pars: [-6.00 5.15 -0.02 0.63 14.22 -179.56]
8 A.A9 B.A8 A+A -- 02-II tHH tM+M
[-177.4(anti) ~C3'-endo lambda=12.4] [-175.8(anti) ~C3'-endo lambda=10.3]
d(C1'-C1')=11.01 d(N1-N9)=8.15 d(C6-C8)=6.18 tor(C1'-N1-N9-C1')=158.5
H-bonds[4]: "OP2-N6(amino)[2.93],N7-N6(amino)[2.88],N6(amino)-OP2[2.97],N6(amino)-N7[2.92]"
interBase-angle=15 Simple-bpParams: Shear=-7.91 Stretch=0.56 Buckle=-7.0 Propeller=13.7
bp-pars: [6.11 -5.06 -0.05 -2.26 -15.22 179.23]
Standard nitrogenous bases in DNA and RNA (A, C, G, T, and U) are aromatic compounds, each with a planar geometry. In the analyses of three-dimensional (3D) nucleic acid structures, the planar bases are normally taken as rigid bodies. The relative geometry of the two bases in base pair (bp) can then be rigorously quantified by six rigid-body parameters (see figure below). The three translations along the x-, y- and z-axes are termed Shear, Stretch, and Stagger, respectively. The three corresponding rotations are called Buckle, Propeller (twist), and Opening.
3DNA is unique with its coupled analyze
and rebuild
programs. The former calculates six bp parameters given 3D atomic coordinates (in PDB or PDBx/mmCIF format), while the later takes a set of such parameters to generate the corresponding structure. The rigor of the description can be easily verified in two equivalent ways: the close to zero root-mean-square deviation (RMSD) between the rebuilt structure and the original coordinates, after a least-squares superposition; or the identical six bp parameters when the rebuilt structure is analyzed.
As is often the case, a concrete example would make the point clear. Here I am using the reverse Hoogsteen (rHoogsteen) bp between U8 and A14 (see image below) in the yeast phenylalanine tRNA (1ehz) as an example. The PDB atomic coordinates of the U8–A14 rHoogsteen pair, excluding backbone atoms except for C1′, is stored in file 1ehz-U8-A14.pdb
.
find_pair 1ehz-U8-A14.pdb stdout | analyze stdin
# bp parameters in file '1ehz-U8-A14.out'
# also generated 'bp_step.par' for rebuilding below
rebuild -atomic bp_step.par 1ehz-U8-A14-3DNA.pdb
# rmsd is 0.044 Å between '1ehz-U8-A14.pdb' and '1ehz-U8-A14-3DNA.pdb'
find_pair 1ehz-U8-A14-3DNA.pdb stdout | analyze stdin
# bp parameters of the rebuilt structure in '1ehz-U8-A14-3DNA.out'
rebuild -atomic bp_step.par 1ehz-U8-A14-3DNA-new.pdb
# rmsd is 0 Å between '1ehz-U8-A14-3DNA.pdb' and '1ehz-U8-A14-3DNA-new.pdb'
Note that the above commands should be performed in order, since the file bp_step.par
is overwritten after each analyze
run. For your verification, here are the links to the five files:
The 0.044 Å rmsd between the original PDB coordinates in 1ehz-U8-A14.pdb
and the 3DNA rebuilt structure in 1ehz-U8-A14-3DNA.pdb
is due to the slight non-planarity of experimental bases. The rmsd is 0 between the two rounds of 3DNA rebuilt structures, 1ehz-U8-A14-3DNA.pdb
and 1ehz-U8-A14-3DNA-new.pdb
, as expected.
The bp parameters in 1ehz-U8-A14.out
and 1ehz-U8-A14-3DNA.out
are identical, as expected, and they are shown below.
Local base-pair parameters
bp Shear Stretch Stagger Buckle Propeller Opening
1 U-A 4.14 -1.91 0.77 -4.62 12.12 -103.09
Running DSSR on 1ehz-U8-A14.pdb
gives the following results. Note that the six bp parameters (last row prefixed with bp-pars
) are the exactly same as in 3DNA — we are consistent.
# x3dna-dssr -i=1ehz-U8-A14.pdb --more
List of 1 base pair
nt1 nt2 bp name Saenger LW DSSR
1 A.U8 A.A14 U-A rHoogsteen 24-XXIV tWH tW-M
[n/a(n/a) ---- lambda=28.3] [n/a(n/a) ---- lambda=21.5]
d(C1'-C1')=9.63 d(N1-N9)=7.06 d(C6-C8)=6.00 tor(C1'-N1-N9-C1')=174.4
H-bonds[2]: "O2(carbonyl)-N6(amino)[3.00],N3(imino)-N7[2.74]"
interBase-angle=12.97 Simple-bpParams: Shear=4.28 Stretch=1.55 Buckle=-11.8 Propeller=5.4
bp-pars: [4.14 -1.91 0.77 -4.62 12.12 -103.09]
As mentioned in the recent DSSR paper:
As in 3DNA (6,7), DSSR takes advantage of the six standard base-pair parameters––three translations (Shear, Stretch, Stagger) and three rotations (Buckle, Propeller, Opening)––to quantify the relative spatial position and orientation of any two interacting bases rigorously. Among the six parameters, only Shear, Stretch, and Opening are critical for characterizing different types of pairs. Buckle, Propeller and Stagger, on the other hand, describe the nonplanarity of a given pair (6). By virtue of the definition of the standard base reference frame, Shear, Stretch, and Opening are all close to zero for Watson-Crick pairs. Moreover, every other type of pair has a set of characteristic parameters. For example, the wobble G–U pair is characterized by an average Shear of –2.2 Å, and the Hoogsteen A+U pair is distinguished by a Stretch of approximately –3.5 Å and an Opening of near 66º.
In a follow-up post, I will talk about the “simple” bp parameters (Simple-bpParams
in the above DSSR output list) recently introduced into DSSR — stay tuned!
As of DSSR v1.3.0-2015aug27, the --json
option is available for producing analysis results that is strictly compliant with the JSON data exchange format. The JSON file contains numerous DSSR-derived structural features, including those in the default main output, backbone torsions in dssr-torsions.txt
, and a detailed list of hydrogen bonds.
According to the official JSON website:
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language… JSON is a text format that is completely language independent… These properties make JSON an ideal data-interchange language.
Indeed, the JSON output file makes DSSR readily accessible for integration with other bioinformatics tools or normal usages from the command line. Using the classic yeast phenylalanine tRNA 1ehz as an example (1ehz.pdb
), let’s go over some simple use-cases. Note that the following examples take advantage of jq, a lightweight and flexible command-line JSON processor.
x3dna-dssr -i=1ehz.pdb --json -o=1ehz-dssr.json
jq . 1ehz-dssr.json # reformatted for pretty output
x3dna-dssr -i=1ehz.pdb --json | jq . # the above 2 steps combined
With 1ehz-dssr.json
in hand, we can easily extract DSSR-derived structural features of interest:
jq .pairs 1ehz-dssr.json # list of 34 pairs
jq .multiplets 1ehz-dssr.json # list of 4 base triplets
jq .hbonds 1ehz-dssr.json # list of hydrogen bonds
jq .helices 1ehz-dssr.json
jq .stems 1ehz-dssr.json
# list of nucleotide parameters, including torsion angles and suites
jq .ntParams 1ehz-dssr.json
# list of 14 modified nucleotides
jq '.ntParams[] | select(.is_modified)' 1ehz-dssr.json
# select nucleotide id, delta torsion, sugar puckering and cluster of suite name
jq '.ntParams[] | {nt_id, delta, puckering, cluster}' 1ehz-dssr.json
# same selection as above, but in 'Comma Separated Values' format
jq -r '.ntParams[] | [.nt_id, .delta, .puckering, .cluster] | @csv' 1ehz-dssr.json
Here is the result of running jq
(v1.5) to select multiplets:
# jq .multiplets 1ehz-dssr.json
[
{
"index": 1,
"num_nts": 3,
"nts_short": "UAA",
"nts_long": "A.U8,A.A14,A.A21"
},
{
"index": 2,
"num_nts": 3,
"nts_short": "AUA",
"nts_long": "A.A9,A.U12,A.A23"
},
{
"index": 3,
"num_nts": 3,
"nts_short": "gCG",
"nts_long": "A.2MG10,A.C25,A.G45"
},
{
"index": 4,
"num_nts": 3,
"nts_short": "CGg",
"nts_long": "A.C13,A.G22,A.7MG46"
}
]
With the JSON file, DSSR can now be connected with the bioinformatics community in a ‘structured’ way, with a clearly delineated boundary. Now I can enjoy the freedom of refining the default main output format, without worrying too much about breaking third-party parsers. Moreover, I no longer need to write an adapter for each integration of DSSR with other tools. So nice!
For your reference, here is the output file 1ehz-dssr.json. It may be possible that the identifiers (names) of the JSON output will be refined in the next few iterations. I welcome your comments to make the DSSR-derived JSON better suite your needs.
It is a great pleasure to note that a paper titled DSSR, an integrated software tool for dissecting the spatial structure of RNA has recently been published in Nucleic Acids Research (NAR). Co-authored by Harmen Bussemaker, Wilma Olson and me (a team with a unique combination of complementary expertise), this DSSR paper represents another solid piece of work that I feel proud of. In contrast to our previous GpU dinucleotide platform paper focusing on results, and the two major 3DNA papers concentrating on methods, the current NAR article describes significant scientific findings that are enabled by the novel analysis algorithms implemented in the program. Moreover, DSSR introduces an appealing and highly informative “cartoon-block” representation of RNA structures that combines PyMOL cartoon schematics with 3DNA base color-coded rectangular blocks.
The abstract of the paper is quoted below:
Insight into the three-dimensional architecture of RNA is essential for understanding its cellular functions. However, even the classic transfer RNA structure contains features that are overlooked by existing bioinformatics tools. Here we present DSSR (Dissecting the Spatial Structure of RNA), an integrated and automated tool for analyzing and annotating RNA tertiary structures. The software identifies canonical and noncanonical base pairs, including those with modified nucleotides, in any tautomeric or protonation state. DSSR detects higher-order coplanar base associations, termed multiplets. It finds arrays of stacked pairs, classifies them by base-pair identity and backbone connectivity, and distinguishes a stem of covalently connected canonical pairs from a helix of stacked pairs of arbitrary type/linkage. DSSR identifies coaxial stacking of multiple stems within a single helix and lists isolated canonical pairs that lie outside of a stem. The program characterizes ‘closed’ loops of various types (hairpin, bulge, internal, and junction loops) and pseudoknots of arbitrary complexity. Notably, DSSR employs isolated pairs and the ends of stems, whether pseudoknotted or not, to define junction loops. This new, inclusive definition provides a novel perspective on the spatial organization of RNA. Tests on all nucleic acid structures in the Protein Data Bank confirm the efficiency and robustness of the software, and applications to representative RNA molecules illustrate its unique features. DSSR and related materials are freely available at http://x3dna.org/.
During the review process, we are delighted that the referees confirmed the claim that we made in the cover letter: “We would also like to emphasize that our reported results are easily verifiable, and we assure rigorous reproducibility of the data and figures described in this article.” Now that the paper has been published, as a follow-up, I’ve made available all the scripts and data files associated with the paper in a new section DSSR-NAR paper on the 3DNA Forum. The DSSR User Manual has also been updated with additional, previously undocumented, auxiliary options.
Overall, it took me more than ten days to create the 19 posts in the DSSR-NAR paper section and to revise the DSSR User Manual, along with other minor refinements for consistency. During the process, I’ve tried to make the scripts and data files self-contained for wide accessibility and easy understanding.
Any interested party should now be able to reproduce the table and figures (including the supplementary data) reported in the article. Moreover, with the additional details given in the post RNA cartoon-block representations with PyMOL and DSSR, one can easily generate similar schematic images as shown below:
I feel confident to claim that the results reported in our DSSR paper are reproducible. If you have issues related to the paper, please post them on the 3DNA Forum. I strive to respond promptly to any questions asked there.
In summary, DSSR is an integrated computational tool, designed from the bottom up to streamline the analysis of RNA three-dimensional structures. It is built upon my extensive experience in supporting 3DNA, growing knowledge of RNA structures, and refined programming skills. DSSR has a combined set of functionalities well beyond the scope of any known specialized resources. The program may well serve as a cornerstone for RNA structural bioinformatics and will benefit a broad range of possible applications.
Nowadays, “big data” and “big science” are hot topics. They all sound good and certainly come about for a reason. Yet, to transform data to information to knowledge to understanding to wisdom, sophisticated software tools are required. The programs can be big and complicated, or small and self-contained, fitting different purposes. As long as they can get the claimed job done in a robust fashion, size should not be a concern.
Over the years, however, I have seen a trend of bloated software with many (fragile) dependencies in bioinformatics. Some tools are so picky and hard to use/maintain that instead of serving, they become sort of a master. As a more representative example, I recently tried to install an open-source software associated with a paper published just a few years ago in a leading journal. The software has only a few dependencies, yet some of them have already become obsolete. I spent hours each time, on Mac OS X and two versions of Ubuntu Linux, but failed to get it running properly (always abort with error messages). The download page hosting the software has been inactive since around the publication of the paper. Presumably, the PhD student or postdoc who wrote the code had left the lab, and with a paper published, all is done!
As an active practitioner of bioinformatics for well over a decade, I can confidently claim to be well above average in familiarity with Linux/Mac OS X and associated shell programming and make etc tools, and various common scripting and compiled programming languages. Yet, once in a while, I get frustrated when I try to download and install a software tool attached to a paper I am interested in. As I see it, the vast majority of software programs from research labs are publication-oriented — as long a paper is published, it is finished.
From my experience, I always see software as engineering. It needs careful design and great attention to meticulous details. A sophisticated piece of scientific software is a combination of science and engineering. Expertise in domain knowledge is a must, and refined skills in computer programming is indispensable. The DSSR program I created and continuously refined over the past three years represents what a scientific software should be in my believe.
Among other unique features, DSSR is tiny (< 1mb), self-contained (without run-time dependencies) and runs on Windows, Mac OS X, and Linux. Getting DSSR up and running should take only minutes by any one with basic familiarity of common computer systems. I have no doubt that the beauty of being small as represented by DSSR will be gradually appreciated by the community.
Over the past few weeks, I’ve had the pleasure to talk to Thomas Holder, the PyMOL Principal Developer at Schrödinger, on possible integration of DSSR into PyMOL. On Tuesday April 21, 2015, I wrote to Thomas:
Last year, I had the please to collaborate with Dr. Robert Hanson to integrate DSSR into Jmol, see
http://chemapps.stolaf.edu/jmol/jsmol/dssr.htm. I am wondering if you have any interest in connecting DSSR to PyMOL. This will not only benefit both parties, but also bring elaborate analyses of RNA structures to the general audience. As you may be aware, RNA is becoming increasing important, yet the field of RNA structural bioinformatics is lagging (far) behind that of proteins.
After a few meet-ups, we all agree that the DSSR-PyMOL integration project would be meaningful/significant for RNA structural bioinformatics. Moreover, the community not only can benefit from the end result, but also should be able to make direct contributions through the process. On Friday May 08, 2015, Thomas sent out the following open invitation, titled Someone interested in writing a DSSR plugin for PyMOL?, to the PyMOL mailing list:
Is anyone interested in writing a DSSR plugin for PyMOL? DSSR is an integrated software tool for Dissecting the Spatial Structure of RNA (http://x3dna.bio.columbia.edu/docs/dssr-manual.pdf). Among other things, DSSR defines the secondary structure of RNA from 3D atomic coordinates in a way similar to DSSP does for proteins. Most of its output could be translated 1:1 into PyMOL selections, making it available for coloring and other selection based features. A PyMOL plugin could act as a wrapper which runs DSSR for an object or atom selection. Xiang-Jun Lu, the author of DSSR, is also working on base pair visualization (see http://x3dna.org/articles/seeing-is-understanding-as-well-as-believing), similar to (but more advanced) what’s already available from 3DNA (http://pymolwiki.org/index.php/3DNA).
Xiang-Jun would be happy to collaborate with someone who has experience with Python and the PyMOL API for writing an extension or plugin. Please contact me if this sounds appealing to you.
Get DSSR from http://x3dna.org/
See it hooked up with JSmol: http://chemapps.stolaf.edu/jmol/jsmol/dssr.htm
If you are self-motivated, care about software quality, have expertise in writing PyMOL plugin, and feel the pain in RNA structural analysis/visualization with currently available tools, now it is the time to make a difference. The DSSR/PyMOL project would ideally be composed of a team of dedicated practitioners with complementary skills. We will communicate mostly via email or online forum, in a presumably open and highly interactive way. By working on the project, you will be able to sharpen your skills and make new friends. The end product would not only make RNA structural bioinformatics easier for yourself but also benefit the community at large.
The v1.2.1 (2015feb01) release of DSSR contains a new functionality to characterize the so-called H-type pseudoknots. In this classical and most common type of pseudoknots, nucleotides from a hairpin loop form Watson-Crick base pairs with a single-stranded region outside of the hairpin to create another (adjacent) stem, as shown in the following illustration (taken from the Huang et al. paper A heuristic approach for detecting RNA H-type pseudoknots).
Normally, L2 is absent (i.e., with zero nucleotides) due to direct coaxial stacking of the two stems. An example output of DSSR on 1ymo (a human telomerase RNA pseudoknot) is shown below:
The corresponding sections from DSSR output are:
****************************************************************************
List of 3 H-type pseudoknot loop segments
1 stem#1(hairpin#1) vs stem#2(hairpin#2) L1 groove=MAJOR nts=8 UUUUUCUC U7,U8,U9,U10,U11,C12,U13,C14
2 stem#1(hairpin#1) vs stem#2(hairpin#2) L2 groove=----- nts=0
3 stem#1(hairpin#1) vs stem#2(hairpin#2) L3 groove=minor nts=8 CAAACAAA C30,A31,A32,A33,C34,A35,A36,A37
****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>1ymo-1-A #1 nts=47 [chain] RNA
GGGCUGUUUUUCUCGCUGACUUUCAGCCCCAAACAAAAAAGUCAGCA
[[[[[[........(((((((((]]]]]]........))))))))).
Checking against the three-dimensional image and the secondary structure in linear form shown above, the meaning of the new section should be obvious. If you want to see more details, click the link to the DSSR-output file on 1ymo.
Recently I came across the following two citations to DSSR:
Base pair types were annotated with RNAview (45,46). Hydrogen bonds were annotated manually and with the help of DSSR of the 3DNA package (47,48). Helix parameters were obtained using the Curves+ web server (49). Structural figures were prepared using PyMol (50).
It is interesting to note that DSSR is cited here for its identification of hydrogen bonds, not its annotation of base pairs, among many other features. The simple geometry-based H-bonding identification algorithm, originally implemented in find_pair/analyze
of 3DNA (and adopted by RNAView) and highly refined in DSSR, works well for nucleic acid structures. With the --get-hbonds
option, users can now use DSSR as a tool just for its list of H-bonds outside of the program.
All figures were generated using PyMOL (60) or Chimera (48). The secondary structure diagram of the human mitoribosomal RNA was prepared by extracting base pairs from the model using DSSR (61). The secondary structure diagram was drawn in VARNA (62) and finalized in Inkscape.
I am very pleased to see that DSSR was cited for its ‘intended’ use in this important piece of work from a leading laboratory in structural biology. In the middle of last November (2013), I was approached by the lead author for proper citation of DSSR, and I suggested the two 3DNA papers. As far as I can remember, this was the first time I received such a question on DSSR citation. It prompted to write a FAQ entry in the DSSR User Manual, titled “How to cite DSSR?”. Hopefully, this citation issue will be gone in the near future.
Over the past two years, I’ve devoted significant efforts to make DSSR a handy tool for RNA structural bioinformatics; it certainly represents my view as to what a scientific software program should be like. As time passes by, DSSR is becoming increasingly sophisticated and citations to DSSR can only be higher.
Recently, PDB begins to release atomic coordinates of large (ribosomal) structures in mmCIF format. For nucleic-acid-containing structures, the largest one so far is 4v4g, the crystal structure of five 70S ribosomes from Escherichia coli in complex with protein Y. It is assembled from ten PDB entries (1voq, 1vor, 1vos, 1vou, 1vov, 1vow, 1vox, 1voy, 1voz, 1vp0), consisting of 22,345 nucleotides, and a total of 717,805 atoms.
This humongous structure poses no problems to DSSR at all, as shown below.
Command: x3dna-dssr -i=4v4g.cif -o=4v4g.out
Processing file '4v4g.cif' [4v4g]
total number of base pairs: 9277
total number of multiplets: 918
total number of helices: 1099
total number of stems: 1221
total number of isolated WC/wobble pairs: 603
total number of atom-base stacking interactions: 1736
total number of hairpin loops: 504
total number of bulges: 170
total number of internal loops: 775
total number of junctions: 214
total number of non-loop single-stranded segments: 429
total number of kissing loops: 5
total number of A-minor (type I and II) motifs: 100
total number of ribose zippers: 58 (1159)
total number of kink turns: 39
Time used: 00:00:10:45
It took less than 11 minutes to run on an iMac (and nearly 14 minutes on a Ubuntu Linux machine). Given the