X3DNA-DSSR Homepage -- Nucleic Acid Structures

Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. [Nucleic Acids Res 48: e74(https://doi.org/10.1093/nar/gkaa426)).

See the 2020 paper titled "DSSR-enabled innovative schematics of 3D nucleic acid structures with PyMOL" in Nucleic Acids Research and the corresponding Supplemental PDF for details. Many thanks to Drs. Wilma Olson and Cathy Lawson for their help in the preparation of the illustrations.

Details on how to reproduce the cover images are available on the 3DNA Forum.

June 2025

Structure of a group II intron ribonucleoprotein in the pre-ligation state (PDB id: 8T2R; Xu L, Liu T, Chung K, Pyle AM. 2023. Structural insights into intron catalysis and dynamics during splicing. Nature 624: 682–688). The pre-ligation complex of the Agathobacter rectalis group II intron reverse transcriptase/maturase with intron and 5′-exon RNAs makes it possible to construct a picture of the splicing active site. The intron is depicted by a green ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the 5′-exon is shown by white spheres and the protein by a gold ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

May 2025

Complex of terminal uridylyltransferase 7 (TUT7) with pre-miRNA and Lin28A (PDB id: 8OPT; Yi G, Ye M, Carrique L, El-Sagheer A, Brown T, Norbury CJ, Zhang P, Gilbert RJ. 2024. Structural basis for activity switching in polymerases determining the fate of let-7 pre-miRNAs. Nat Struct Mol Biol 31: 1426–1438). The RNA-binding pluripotency factor LIN28A invades and melts the RNA and affects the mechanism of action of the TUT7 enzyme. The RNA backbone is depicted by a red ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; TUT7 is represented by a gold ribbon and LIN28A by a white ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

April 2025

Cryo-EM structure of the pre-B complex (PDB id: 8QP8; Zhang Z, Kumar V, Dybkov O, Will CL, Zhong J, Ludwig SE, Urlaub H, Kastner B, Stark H, Lührmann R. 2024. Structural insights into the cross-exon to cross-intron spliceosome switch. Nature 630: 1012–1019). The pre-B complex is thought to be critical in the regulation of splicing reactions. Its structure suggests how the cross-exon and cross-intron spliceosome assembly pathways converge. The U4, U5, and U6 snRNA backbones are depicted respectively by blue, green, and red ribbons, with bases and Watson-Crick base pairs shown as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the proteins are represented by gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

February 2025

Structure of the Hendra henipavirus (HeV) nucleoprotein (N) protein-RNA double-ring assembly (PDB id: 8C4H; Passchier TC, White JB, Maskell DP, Byrne MJ, Ranson NA, Edwards TA, Barr JN. 2024. The cryoEM structure of the Hendra henipavirus nucleoprotein reveals insights into paramyxoviral nucleocapsid architectures. Sci Rep 14: 14099). The HeV N protein adopts a bi-lobed fold, where the N- and C-terminal globular domains are bisected by an RNA binding cleft. Neighboring N proteins assemble laterally and completely encapsidate the viral genomic and antigenomic RNAs. The two RNAs are depicted by green and red ribbons. The U bases of the poly(U) model are shown as cyan blocks. Proteins are represented as semitransparent gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

January 2025

Structure of the helicase and C-terminal domains of Dicer-related helicase-1 (DRH-1) bound to dsRNA (PDB id: 8T5S; Consalvo CD, Aderounmu AM, Donelick HM, Aruscavage PJ, Eckert DM, Shen PS, Bass BL. 2024. Caenorhabditis elegans Dicer acts with the RIG-I-like helicase DRH-1 and RDE-4 to cleave dsRNA. eLife 13: RP93979. Cryo-EM structures of Dicer-1 in complex with DRH-1, RNAi deficient-4 (RDE-4), and dsRNA provide mechanistic insights into how these three proteins cooperate in antiviral defense. The dsRNA backbone is depicted by green and red ribbons. The U-A pairs of the poly(A)·poly(U) model are shown as long rectangular cyan blocks, with minor-groove edges colored white. The ADP ligand is represented by a red block and the protein by a gold ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

Moreover, the following 30 [12(2021) + 12(2022) + 6(2023)] cover images of the RNA Journal were generated by the NAKB (nakb.org).

Cover image provided by the Nucleic Acid Database (NDB)/Nucleic Acid Knowledgebase (NAKB; nakb.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

Assignment of HETATM vs. ATOM records for modified nucleotides in PDB vs. PDBx/mmCIF format

Recently, I came across and have been surprised by the different assignment of HETATM vs. ATOM records for modified nucleotides in PDB vs. PDBx/mmCIF format. As always, the issue is best illustrated with a concrete example. Here is what I observed in the PDB entry 1ehz, the crystal structure of yeast phenylalanine tRNA at 1.93 Å resolution.

DSSR identifies 14 modified nucleotides (of 11 types) in 1ehz as shown below:

List of 11 types of 14 modified nucleotides
      nt    count  list
   1 1MA-a    1    A.1MA58
   2 2MG-g    1    A.2MG10
   3 5MC-c    2    A.5MC40,A.5MC49
   4 5MU-t    1    A.5MU54
   5 7MG-g    1    A.7MG46
   6 H2U-u    2    A.H2U16,A.H2U17
   7 M2G-g    1    A.M2G26
   8 OMC-c    1    A.OMC32
   9 OMG-g    1    A.OMG34
  10 PSU-P    2    A.PSU39,A.PSU55
  11 YYG-g    1    A.YYG37

In file 1ehz.pdb downloaded from RCSB PDB, all the 14 modified nucleotides are assigned as HETATM whereas in 1ehz.cif the corresponding records are ATOM. Here is the excerpt for 1MA58 in PDB format:

HETATM 1252  P   1MA A  58      73.770  67.765  34.057  1.00 30.65           P  
HETATM 1253  OP1 1MA A  58      72.638  67.886  33.105  1.00 32.84           O  
HETATM 1254  OP2 1MA A  58      73.621  68.229  35.450  1.00 29.49           O  
HETATM 1255  O5' 1MA A  58      74.315  66.273  34.254  1.00 28.81           O  
HETATM 1256  C5' 1MA A  58      74.592  65.439  33.080  1.00 29.42           C  
HETATM 1257  C4' 1MA A  58      74.279  63.972  33.383  1.00 33.42           C  
HETATM 1258  O4' 1MA A  58      74.880  63.685  34.667  1.00 32.36           O  
HETATM 1259  C3' 1MA A  58      72.789  63.573  33.509  1.00 35.13           C  
HETATM 1260  O3' 1MA A  58      72.625  62.168  33.250  1.00 36.80           O  
HETATM 1261  C2' 1MA A  58      72.560  63.667  35.012  1.00 34.80           C  
HETATM 1262  O2' 1MA A  58      71.525  62.828  35.506  1.00 36.27           O  
HETATM 1263  C1' 1MA A  58      73.908  63.150  35.551  1.00 33.62           C  
HETATM 1264  N9  1MA A  58      74.284  63.494  36.930  1.00 30.36           N  
HETATM 1265  C8  1MA A  58      73.887  64.574  37.688  1.00 34.55           C  
HETATM 1266  N7  1MA A  58      74.415  64.610  38.899  1.00 33.32           N  
HETATM 1267  C5  1MA A  58      75.204  63.469  38.953  1.00 33.37           C  
HETATM 1268  C6  1MA A  58      76.031  62.941  39.948  1.00 33.58           C  
HETATM 1269  N6  1MA A  58      76.184  63.488  41.134  1.00 41.19           N  
HETATM 1270  N1  1MA A  58      76.708  61.803  39.669  1.00 34.48           N  
HETATM 1271  CM1 1MA A  58      77.649  61.222  40.626  1.00 31.43           C  
HETATM 1272  C2  1MA A  58      76.527  61.216  38.479  1.00 28.43           C  
HETATM 1273  N3  1MA A  58      75.793  61.624  37.453  1.00 31.67           N  
HETATM 1274  C4  1MA A  58      75.142  62.771  37.747  1.00 33.02           C

The corresponding section in PDBx/mmCIF format is:

ATOM   1252 P  P     . 1MA A 1 58 ? 73.770 67.765 34.057  1.00 30.65  ? ? ? ? ? ? 58  1MA A P     1 
ATOM   1253 O  OP1   . 1MA A 1 58 ? 72.638 67.886 33.105  1.00 32.84  ? ? ? ? ? ? 58  1MA A OP1   1 
ATOM   1254 O  OP2   . 1MA A 1 58 ? 73.621 68.229 35.450  1.00 29.49  ? ? ? ? ? ? 58  1MA A OP2   1 
ATOM   1255 O  "O5'" . 1MA A 1 58 ? 74.315 66.273 34.254  1.00 28.81  ? ? ? ? ? ? 58  1MA A "O5'" 1 
ATOM   1256 C  "C5'" . 1MA A 1 58 ? 74.592 65.439 33.080  1.00 29.42  ? ? ? ? ? ? 58  1MA A "C5'" 1 
ATOM   1257 C  "C4'" . 1MA A 1 58 ? 74.279 63.972 33.383  1.00 33.42  ? ? ? ? ? ? 58  1MA A "C4'" 1 
ATOM   1258 O  "O4'" . 1MA A 1 58 ? 74.880 63.685 34.667  1.00 32.36  ? ? ? ? ? ? 58  1MA A "O4'" 1 
ATOM   1259 C  "C3'" . 1MA A 1 58 ? 72.789 63.573 33.509  1.00 35.13  ? ? ? ? ? ? 58  1MA A "C3'" 1 
ATOM   1260 O  "O3'" . 1MA A 1 58 ? 72.625 62.168 33.250  1.00 36.80  ? ? ? ? ? ? 58  1MA A "O3'" 1 
ATOM   1261 C  "C2'" . 1MA A 1 58 ? 72.560 63.667 35.012  1.00 34.80  ? ? ? ? ? ? 58  1MA A "C2'" 1 
ATOM   1262 O  "O2'" . 1MA A 1 58 ? 71.525 62.828 35.506  1.00 36.27  ? ? ? ? ? ? 58  1MA A "O2'" 1 
ATOM   1263 C  "C1'" . 1MA A 1 58 ? 73.908 63.150 35.551  1.00 33.62  ? ? ? ? ? ? 58  1MA A "C1'" 1 
ATOM   1264 N  N9    . 1MA A 1 58 ? 74.284 63.494 36.930  1.00 30.36  ? ? ? ? ? ? 58  1MA A N9    1 
ATOM   1265 C  C8    . 1MA A 1 58 ? 73.887 64.574 37.688  1.00 34.55  ? ? ? ? ? ? 58  1MA A C8    1 
ATOM   1266 N  N7    . 1MA A 1 58 ? 74.415 64.610 38.899  1.00 33.32  ? ? ? ? ? ? 58  1MA A N7    1 
ATOM   1267 C  C5    . 1MA A 1 58 ? 75.204 63.469 38.953  1.00 33.37  ? ? ? ? ? ? 58  1MA A C5    1 
ATOM   1268 C  C6    . 1MA A 1 58 ? 76.031 62.941 39.948  1.00 33.58  ? ? ? ? ? ? 58  1MA A C6    1 
ATOM   1269 N  N6    . 1MA A 1 58 ? 76.184 63.488 41.134  1.00 41.19  ? ? ? ? ? ? 58  1MA A N6    1 
ATOM   1270 N  N1    . 1MA A 1 58 ? 76.708 61.803 39.669  1.00 34.48  ? ? ? ? ? ? 58  1MA A N1    1 
ATOM   1271 C  CM1   . 1MA A 1 58 ? 77.649 61.222 40.626  1.00 31.43  ? ? ? ? ? ? 58  1MA A CM1   1 
ATOM   1272 C  C2    . 1MA A 1 58 ? 76.527 61.216 38.479  1.00 28.43  ? ? ? ? ? ? 58  1MA A C2    1 
ATOM   1273 N  N3    . 1MA A 1 58 ? 75.793 61.624 37.453  1.00 31.67  ? ? ? ? ? ? 58  1MA A N3    1 
ATOM   1274 C  C4    . 1MA A 1 58 ? 75.142 62.771 37.747  1.00 33.02  ? ? ? ? ? ? 58  1MA A C4    1

While I have not tested exhaustively, it seems true that PDBx/mmCIF has adopted a different definition of what constitutes a HETATM residue. It is worth noting that results from 3DNA and DSSR/SNAP are not effected by the conflicting assignments.

Comment

The value of tiny and self-contained software in the big-data era

Nowadays, “big data” and “big science” are hot topics. They all sound good and certainly come about for a reason. Yet, to transform data to information to knowledge to understanding to wisdom, sophisticated software tools are required. The programs can be big and complicated, or small and self-contained, fitting different purposes. As long as they can get the claimed job done in a robust fashion, size should not be a concern.

Over the years, however, I have seen a trend of bloated software with many (fragile) dependencies in bioinformatics. Some tools are so picky and hard to use/maintain that instead of serving, they become sort of a master. As a more representative example, I recently tried to install an open-source software associated with a paper published just a few years ago in a leading journal. The software has only a few dependencies, yet some of them have already become obsolete. I spent hours each time, on Mac OS X and two versions of Ubuntu Linux, but failed to get it running properly (always abort with error messages). The download page hosting the software has been inactive since around the publication of the paper. Presumably, the PhD student or postdoc who wrote the code had left the lab, and with a paper published, all is done!

As an active practitioner of bioinformatics for well over a decade, I can confidently claim to be well above average in familiarity with Linux/Mac OS X and associated shell programming and make etc tools, and various common scripting and compiled programming languages. Yet, once in a while, I get frustrated when I try to download and install a software tool attached to a paper I am interested in. As I see it, the vast majority of software programs from research labs are publication-oriented — as long a paper is published, it is finished.

From my experience, I always see software as engineering. It needs careful design and great attention to meticulous details. A sophisticated piece of scientific software is a combination of science and engineering. Expertise in domain knowledge is a must, and refined skills in computer programming is indispensable. The DSSR program I created and continuously refined over the past three years represents what a scientific software should be in my believe.

Among other unique features, DSSR is tiny (< 1mb), self-contained (without run-time dependencies) and runs on Windows, Mac OS X, and Linux. Getting DSSR up and running should take only minutes by any one with basic familiarity of common computer systems. I have no doubt that the beauty of being small as represented by DSSR will be gradually appreciated by the community.

Comment

Open invitation on writing a DSSR plugin for PyMOL

Over the past few weeks, I’ve had the pleasure to talk to Thomas Holder, the PyMOL Principal Developer at Schrödinger, on possible integration of DSSR into PyMOL. On Tuesday April 21, 2015, I wrote to Thomas:

Last year, I had the please to collaborate with Dr. Robert Hanson to integrate DSSR into Jmol, see
http://chemapps.stolaf.edu/jmol/jsmol/dssr.htm. I am wondering if you have any interest in connecting DSSR to PyMOL. This will not only benefit both parties, but also bring elaborate analyses of RNA structures to the general audience. As you may be aware, RNA is becoming increasing important, yet the field of RNA structural bioinformatics is lagging (far) behind that of proteins.

After a few meet-ups, we all agree that the DSSR-PyMOL integration project would be meaningful/significant for RNA structural bioinformatics. Moreover, the community not only can benefit from the end result, but also should be able to make direct contributions through the process. On Friday May 08, 2015, Thomas sent out the following open invitation, titled Someone interested in writing a DSSR plugin for PyMOL?, to the PyMOL mailing list:

Is anyone interested in writing a DSSR plugin for PyMOL? DSSR is an integrated software tool for Dissecting the Spatial Structure of RNA (http://x3dna.bio.columbia.edu/docs/dssr-manual.pdf). Among other things, DSSR defines the secondary structure of RNA from 3D atomic coordinates in a way similar to DSSP does for proteins. Most of its output could be translated 1:1 into PyMOL selections, making it available for coloring and other selection based features. A PyMOL plugin could act as a wrapper which runs DSSR for an object or atom selection. Xiang-Jun Lu, the author of DSSR, is also working on base pair visualization (see http://x3dna.org/articles/seeing-is-understanding-as-well-as-believing), similar to (but more advanced) what’s already available from 3DNA (http://pymolwiki.org/index.php/3DNA).

Xiang-Jun would be happy to collaborate with someone who has experience with Python and the PyMOL API for writing an extension or plugin. Please contact me if this sounds appealing to you.

Get DSSR from http://x3dna.org/
See it hooked up with JSmol: http://chemapps.stolaf.edu/jmol/jsmol/dssr.htm

If you are self-motivated, care about software quality, have expertise in writing PyMOL plugin, and feel the pain in RNA structural analysis/visualization with currently available tools, now it is the time to make a difference. The DSSR/PyMOL project would ideally be composed of a team of dedicated practitioners with complementary skills. We will communicate mostly via email or online forum, in a presumably open and highly interactive way. By working on the project, you will be able to sharpen your skills and make new friends. The end product would not only make RNA structural bioinformatics easier for yourself but also benefit the community at large.

Comment

Ambiguous 'analyze' and 'rebuild' program names

From the very beginning, 3DNA contains two key programs, analyze and rebuild, for the analysis and rebuilding of nucleic acid 3D structures. The two names are short and to the point, but with one caveat. They are common verbs that can be easily picked up by other software packages. When 3DNA and such packages are installed on the same machine, naming clashes happen. If the 3DNA bin/ directory is searched afterwards, the analyze or rebuild command may have nothing to do with nucleic acid structures at all. Naturally, this naming ambiguity can lead to confusions and frustrations.

I’ve been aware of the rebuild program name conflict for a long time. Recently, I was surprised by another analyze program on my Mac OS X Yosemite. As shown from the following output, the analyze program seems to be installed via Mac port, and it is about analyzing words in a dictionary file.

~ [540] which analyze
/opt/local/bin/analyze
~ [541] analyze -h
correct syntax is:
analyze affix_file dictionary_file file_of_words_to_check
use two words per line for morphological generation

The ambiguous names are exactly the reason that I use x3dna-dssr and x3dna-snap for the two new programs I’ve been working over the past few years. As for the analyze and rebuild programs in 3DNA v2.x, I’d rather leave them as is. 3DNA is now in wide use in other structural bioinformatic pipelines to allow for easy name changes without causing compatibility issues. On a positive side, once you know the problem, fixing it is straightforward. This post is to raise the awareness of the 3DNA user community about such naming conflicts.

Comment

Name of base atoms in PDB formats

Canonical bases (A, C, G, T and U) in nucleic acid structures have standard atom names, shown below using the Watson-Crick A–T and G–C pairs. Ring atoms of adenine, for example, are named (N1, C2, N3, C4, C5, C6, N7, C8, N9) respectively.

Watson-Crick base pairs

Four characters are reserved for atom names in the PDB format. The convention, as seen in files downloaded from the RCSB PDB, is to put the two-character base name in the middle, as in .N1.. Note that here each dot (.) is used for a space character to make it stand out.

Long time ago, I became aware a PDB format variant where the base name is left-aligned, as in N1... This case has ever since been properly handled by 3DNA (including DSSR and SNAP). While checking submitted entries to web-DSSR, I recently noticed yet another PDB format variation in labeling base names with the format of ..N1 (i.e., right-aligned). Without taking this special variant of PDB format into consideration, 3DNA/DSSR reported that “no nucleotides found!” Once the issue is known, however, fixing it is straightforward. As of May 4, 2015, 3DNA v2.2, DSSR and SNAP can all handle this special PDB variant correctly.

Over the years, I have come across many PDB variants claimed to compliant with the loosely defined format. If you find 3DNA or DSSR is not working as expected, it is likely the coordinate file in the self-claimed ‘PDB format’ is at fault. Wherever practical, I’ve tried to incorporate as many non-standard variants as possible.

Comment

Nucleic acid structures in the RCSB PDB

The NDB (Nucleic Acid Database) is a valuable resource dedicated to “information about experimentally-determined nucleic acids and complex assemblies.” Over the years, however, I’ve gradually switched from NDB to PDB (Protein Data Bank) for my research on nucleic acid structures. NDB is derived from PDB and presumably should contain all nucleic acid structures available in the PDB. However, at the time of this writing (on April 9, 2015), the NDB says: “As of 8-Apr-2015 number of released structures: 7430” and the PDB states “7611 Nucleic Acid Containing Structures”. So PDB has 7611-7430=181 more entries of nucleic acid structures than the NDB, possibly due to a lag in NDB’s processing of newly released PDB structures. Another issue is the inconsistency of the NDB identifier: early entries have e.g. bdl084 for B-DNA (355d in PDB), but now NDB seems to use the same id as the PDB (e.g., 4p5j).

The RCSB PDB maintains a weekly-updated, summary file named pdb_entry_type.txt in pure text format (check here for a list of useful summary files), containing “List of all PDB entries, identification of each as a protein, nucleic acid, or protein-nucleic acid complex and whether the structure was determined by diffraction or NMR.” An excerpt of the file is shown below:

108m    prot    diffraction
109d    nuc     diffraction
109l    prot    diffraction
109m    prot    diffraction
10gs    prot    diffraction
10mh    prot-nuc        diffraction
110d    nuc     diffraction
110l    prot    diffraction
.................................
102m    prot    diffraction
103d    nuc     NMR

Specifically, a nucleic acid structure contains the (sub)string nuc in the second field, where prot-nuc means a protein-RNA/DNA complex. This text file is trivial to parse, and the atomic coordinates files (in PDB or PDBx/mmCIF format) for all nucleic acid structures can be automatically downloaded from the RCSB PDB using a script.

It is worth noting that DSSR is checked against all nucleic acid structures in the PDB at the time of each release to ensure that it does not crash. I update my local copy of nucleic acid structures each week, and run DSSR on the new entries. This process not only provides me an opportunity to keep pace with new developments in the field but also allows me to keep refining DSSR as needs arise.

Comment

Modified pseudouridines

Pseudouridine (5-ribosyluracil, PSU) is the most abundant modified nucleotide in RNA. It is unique in that it has a C-glycosidic bond (C-C1′) instead of the N-glycosidic bond (N-C’) common to all other nucleotides, canonical or modified. In 3DNA, the one-letter code for PSU is assigned to the upper case ‘P’, reserving the lower case ‘p’ for its modified variants. Distinguishing PSU from standard U (or T) is important for deriving sensible base-pair parameters and the χ torsion angle.


PSU	3TD

Recently, I came across 3TD (see figure above) in PDB entry 5afi. 3TD is a modified variant of PSU, with a methyl group attached to N3. In 3DNA v2.1 v2.1-2015mar11, 3TD is abbreviated to ‘p’ to signify its connection to PSU.

In the list of recognized nucleotides (‘baselist.dat’) distributed with 3DNA, there are two other residues mapped to ‘p’: FHU and P2U (see figure below). As is often the case, it is the chemical structure, not the 3-letter PDB ligand identifier (or even full chemical name), that shows clearly to what 3DNA 1-letter abbreviation a residue matches.


FHU	P2U

Comment

Exterior loop in RNA secondary structure

A single-stranded RNA molecule can fold back onto itself to form various loops delineated by double helical stems, as shown in the figure below [taken from the Nearest Neighbor Database website from the Turner group].

Of special note is the exterior loop (at the bottom) which includes the 5′ and 3′ ends of the sequence. The Mfold User Manual defines the exterior loop as such:

The collection of bases and base pairs not accessible from any base pair is called the exterior (or external) loop … . It is worth noting that if we imagine adding a 0^th and an (n + 1)^st base to the RNA, and a base pair 0.(n+1), then the exterior loop becomes the loop closed by this imaginary base pair. … The exterior loop exists only in linear RNA.

While each of the other loops (hairpin, bulge, internal or junction) forms a closed ‘circle’ with two neighboring bases connected by either a canonical pair or backbone covalent bond, the ‘exterior loop’ has only an imaginary pair to close the 5′ and 3′ ends of the sequence. Moreover, the two ends of an RNA molecule are not necessarily close in three-dimenional space, as may be implied in the above secondary structure diagram. For example, in the H-type pseudoknotted structure 1ymo from human telomerase RNA, the 5′ and 3′ ends are on the opposite sides.

DSSR does not has the concept of an ‘exterior loop’ due to its lack of a closing pair to form a ‘circle’. Instead, each of the 5′ and 3′ dangling ends is taken as a ‘non-loop single-stranded segment’, if applicable. For the crystal structure of yeast phenylalanine tRNA (1ehz, see the figure at the bottom), the relevant portion of DSSR output is as below. Note that since the 5′ end is paired, only the single-stranded region at the 3′ end is listed. Presumably, the ‘exterior loop’ in this case would also include the G1—C72 pair, with the imaginary closing pair connecting G1 and A76.

List of 1 non-loop single-stranded segment
   1 nts=4 ACCA A.A73,A.C74,A.C75,A.A76

Comment

DSSR-derived DBN for an input entry with multiple RNA molecules

Dot bracket notation (dbn) is a popular format to represent RNA secondary structures. Initially introduced by the ViennaRNA package, dbn uses dots (.) for unpaired bases, and matched parentheses () for the canonical Watson-Crick A-T and G-C or the wobble G-U pairs. This compact representation was designed for fully nested (i.e., pseudoknot free) RNA secondary structures in a single RNA molecule. Over the years, it has been extended to cover pseudoknots (of possibly higher orders) using matched pairs of [], {}, and <> etc.

To derive dbn from three-dimensional atomic coordinates with DSSR, I was faced with an issue on how to represent multiple RNA chains (molecules). A closely related yet practical problem is chain breaks, as in x-ray crystal structures where disordered regions may not have fitted coordinates. I searched but failed to find any ‘standard’ way to account for chain breaks or multiple molecules in dbn. The commonly used programs for visualizing RNA secondary structure diagrams that I tested at that time did not take such cases into consideration — they simply showed all bases as if they were from a single continous RNA chain.

I discussed the issue with Dr. Yann Ponty, the maintainer of the popular VARNA program. After a few around of email exchanges, we introduced an extra symbol (&) in both sequence and dbn to designate multiple chains or breaks within a chain to communicate between DSSR and VARNA.

As an example, the DSSR-derived dbn for the double-stranded DNA structure 355d (the famous Dickerson dodecamer) is as below:

Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>355d nts=24 [whole]
CGCGAATTCGCG&CGCGAATTCGCG
((((((((((((&))))))))))))
>355d-A #1 nts=12 [chain] DNA
CGCGAATTCGCG
((((((((((((
>355d-B #2 nts=12 [chain] DNA
CGCGAATTCGCG
))))))))))))

As another example, the PDB entry 2fk6 contains a tRNA with chain breaks — nucleotides 26 to 45 are missing from the structure (see figure below). The DSSR-derived dbn is as follows — note the * at the end of the header line.

>2fk6-R #1 nts=53 [chain] RNA*
GCUUCCAUAGCUCAGCAGGUAGAGC&GUCAGCGGUUCGAGCCCGCUUGGAAGCU
(((((((..((((.....[..))))&...(((((..]....)))))))))))).

It is worth mentioning a subtle point in DSSR-derived dbn with multiple chains, i.e., the order of the chains may make a difference! The point is best illustrated with a concrete example — here, 4un3, the crystal structure of Cas9 bound to PAM-containing DNA target. Based on the data file downloaded directly from the PDB (4un3.pdb), the relevant portions of DSSR output are:

****************************************************************************
Special notes:
   o Cross-paired segments in separate chains, be *careful* with .dbn

****************************************************************************
This structure contains *1-order pseudoknot
   o You may want to run DSSR again with the '--nested' option which removes
     pseudoknots to get a fully nested secondary structure representation.
   o The DSSR-derived dbn may be problematic (see notes above).

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>4un3 nts=120 [whole]
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG&CAATACCATTTTTTACAAATTGAGTTAT&AAATGGTATTG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).&[[[[[[[[))))))))))))))))))))&...]]]]]]]]
>4un3-A #1 nts=81 [chain] RNA
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).
>4un3-C #2 nts=28 [chain] DNA
CAATACCATTTTTTACAAATTGAGTTAT
[[[[[[[[))))))))))))))))))))
>4un3-D #3 nts=11 [chain] DNA
AAATGGTATTG
...]]]]]]]]

The notes in the DSSR output is worth paying attention to. Specifically, it reports a “*1-order pseudoknot” — note also the *! Here the target DNA chain C comes before DNA chain D in the PDB file. The 5′-end bases in chain C pair with bases in D, and the 3′-end bases in C pair with RNA bases in chain A. There exist pairs crossing along the ‘linear’ sequence position-wise, hence the reported “pseudoknot”. However, simply reverse DNA chains C and D, i.e., moving chain D before C (in file 4un3-ADC.pdb), the “pseudoknot” will be gone, as shown below:

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>4un3-ADC nts=120 [whole]
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG&AAATGGTATTG&CAATACCATTTTTTACAAATTGAGTTAT
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).&...((((((((&))))))))))))))))))))))))))))
>4un3-ADC-A #1 nts=81 [chain] RNA
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).
>4un3-ADC-D #2 nts=11 [chain] DNA
AAATGGTATTG
...((((((((
>4un3-ADC-C #3 nts=28 [chain] DNA
CAATACCATTTTTTACAAATTGAGTTAT
))))))))))))))))))))))))))))

Notes added on March 19, 2015

It has drawn to my attention that the NUPACK uses ‘+’ instead of ‘&’ as the symbol to separate multiple chains (or chain breaks). In fact, DSSR has an undocumented option --dbn_break which can be set to any of the character in the string &.:,|+. The ‘&’ symbol was chosen for communication with VARNA which requires ‘&’, at least up to now. This is an excellent example showing the efforts that I have put into the little details while developing DSSR.

The issue on proper ordering of multiple chains to avoid crossing lines (false pseudoknots) has been formally addressed by Dirks et al. in their 2007 article titled Thermodynamic analysis of interacting nucleic acid strands (SIAM Rev, 49, 65-88), specifically in Section 2.1 (Fig. 2.1). Applying that algorithm to nucleic acid structures, however, is beyond the scope of DSSR. The program strictly respects the ordering of chains and nucleotides within a given PDB or PDBx/mmCIF file, but outputs warning messages where necessary to draw users’ attention. As another example, I’ve recently noticed that DNA duplexes produced by Maestro (a product of Schrödinger) list nucleotides of the complementary strand in 3′ to 5′ order to match the 5′ to 3′ directionality of the leading strand for each Watson-Crick pair (See below).

****************************************************************************
Special notes:
   o nucleotides out of order

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>ga62_ca62_1m_in nts=24 [whole]
GGCGAATTCCGG&C&C&G&C&T&T&A&A&G&G&C&C
((((((((((((&)&)&)&)&)&)&)&)&)&)&)&)
>ga62_ca62_1m_in-1-A #1 nts=12 [chain] DNA
GGCGAATTCCGG
((((((((((((
>ga62_ca62_1m_in-1-B #2 nts=12 [chain] DNA
C&C&G&C&T&T&A&A&G&G&C&C
)&)&)&)&)&)&)&)&)&)&)&)

Comment

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)