Output of reference frames in DSSR JSON output

As of v1.3.3-2015sep03, DSSR outputs the reference frame of any base or base-pair (bp). With an explicit list of such reference frames, one can better understand how the 3DNA/DSSR bp parameters are calculated. Moreover, third-party bioinformatics tools can take advantage of the frames for further exploration of nucleic acid structures, including visualization.

Let’s use the G1–C72 bp (detailed below) in the yeast phenylalanine tRNA (1ehz) as an example:

1 A.G1           A.C72          G-C WC          19-XIX    cWW  cW-W

The standard base reference frame for A.G1 is:

{
  rsmd: 0.008,
  origin: [53.757, 41.868, 52.93],
  x_axis: [-0.259, -0.25, -0.933],
  y_axis: [-0.543, 0.837, -0.073],
  z_axis: [0.799, 0.488, -0.352]
}

And the one for A.C72 is:

{
  rsmd: 0.006,
  origin: [53.779, 42.132, 52.224],
  x_axis: [-0.402, -0.311, -0.861],
  y_axis: [0.451, -0.886, 0.109],
  z_axis: [-0.797, -0.345, 0.497]
}

The G1–C72 bp reference frame is:

{
  rsmd: null,
  origin: [53.768, 42, 52.577],
  x_axis: [-0.331, -0.283, -0.9],
  y_axis: [-0.497, 0.863, -0.089],
  z_axis: [0.802, 0.418, -0.427]
}

The beauty of the DSSR JSON output is that the above information can be extracted on the fly. For example, the following commands extract the above frames:

x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.G1") | .frame'
x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.C72") | .frame'
x3dna-dssr -i=1ehz.pdb --json --more | jq .pairs[0].frame

Note that in JSON, the array is 0-indexed, so the first bp (G1–C72) has an index of 0. In addition to jq, I also used underscore to pretty-print the frames.

Comment

---

Conformation of the sugar ring in nucleic acid structures

The conformation of the five-membered sugar ring in DNA/RNA structures can be characterized using the five corresponding endocyclic torsion angles (shown below).

Sugar torsion angles

v0: C4'-O4'-C1'-C2'
v1: O4'-C1'-C2'-C3'
v2: C1'-C2'-C3'-C4'
v3: C2'-C3'-C4'-O4'
v4: C3'-C4'-O4'-C1'

On account of the five-member ring constraint, the conformation can be characterized approximately by 5 - 3 = 2 parameters. Using the concept of pseudorotation of the sugar ring, the two parameters are the amplitude (τm) and phase angle (P, in the range of 0° to 360°).

One set of widely used formula to convert the five torsion angles to the pseudorotation parameters is due to Altona & Sundaralingam (1972): “Conformational Analysis of the Sugar Ring in Nucleosides and Nucleotides. A New Description Using the Concept of Pseudorotation” [J. Am. Chem. Soc., 94(23), pp 8205–8212]. As always, the concept is best illustrated with an example. Here I use the sugar ring of G4 (chain A) of the Dickerson-Drew dodecamer (1bna), with Matlab/Octave code:

# xyz coordinates of the sugar ring: G4 (chain A), 1bna
ATOM     63  C4'  DG A   4      21.393  16.960  18.505  1.00 53.00
ATOM     64  O4'  DG A   4      20.353  17.952  18.496  1.00 38.79
ATOM     65  C3'  DG A   4      21.264  16.229  17.176  1.00 56.72
ATOM     67  C2'  DG A   4      20.793  17.368  16.288  1.00 40.81
ATOM     68  C1'  DG A   4      19.716  17.901  17.218  1.00 30.52

# endocyclic torsion angles:
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4
Pconst = sin(pi/5) + sin(pi/2.5)  # 1.5388
P0 = atan2(v4 + v1 - v3 - v0, 2.0 * v2 * Pconst);  # 2.9034
tm = v2 / cos(P0);  # amplitude: 48.469
P = 180/pi * P0;  # phase angle: 166.35 [P + 360 if P0 < 0]

The Altona & Sundaralingam (1972) pseudorotation parameters are what have been adopted in 3DNA, following the NewHelix program of Dr. Dickerson. The Curves+ program, on the other hand, uses another (newer) set of formula due to Westhof & Sundaralingam (1983): “A Method for the Analysis of Puckering Disorder in Five-Membered Rings: The Relative Mobilities of Furanose and Proline Rings and Their Effects on Polynucleotide and Polypeptide Backbone Flexibility” [J. Am. Chem. Soc., 105(4), pp 970–976]. The two sets of formula, by Altona & Sundaralingam (1972) and Westhof & Sundaralingam (1983), give slightly different numerical values for the two pseudorotation parameters (τm and P).

Since 3DNA and Curves+ are currently two of the most widely used programs for conformational analysis of nucleic acid structures, the subtle differences in pseudorotation parameters may cause confusions for users who use (or are familiar with) both programs. Over the past few years, I have indeed received such questions via email.

With the same G4 (chain A, 1bna) sugar ring, here is the Matlab/Octave script showing how Curve+ calculates the pseudorotation parameters:

# xyz coordinates of sugar ring G4 (chain A, 1bna)

# endocyclic torsion angles, same as above
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4

v = [v2, v3, v4, v0, v1]; # reorder them into vector v[]
A = 0; B = 0;
for i = 1:5
    t = 0.8 * pi * (i - 1);
    A += v(i) * cos(t);
    B += v(i) * sin(t);
end
A *= 0.4;   # -48.476
B *= -0.4;  # 11.516

tm = sqrt(A * A + B * B);  # 49.825

c = A/tm; s = B/tm;
P = atan2(s, c) * 180 / pi;  # 166.64

For this specific example, i.e., the sugar ring of G4 (chain A, 1bna), the pseudorotation parameters as calculated by 3DNA per Altona & Sundaralingam (1972) and Curves+ per Westhof & Sundaralingam (1983) are as follows:

           amplitude        phase angle
3DNA        48.469             166.35
Curves+     49.825             166.64

Needless to say, the differences are subtle, and few people will notice/bother at all. For those who do care about such little details, however, hopefully this post will help you understand where the differences actually come from.

---

For consistency with the 3DNA output, DSSR (by default) also follows the Altona & Sundaralingam (1972) definitions of sugar pseudorotation. Nevertheless, DSSR also contains an undocumented option, --sugar-pucker=westhof83, to output τm and P according to the Westhof & Sundaralingam (1983) definitions.

Each sugar is assigned into one of the following ten puckering modes, by dividing the phase angle (P, in the range of 0° to 360°) into 36° ranges reach.

C3'-endo, C4'-exo,   O4'-endo, C1'-exo,  C2'-endo,
C3'-exo,  C4'-endo,  O4'-exo,  C1'-endo, C2'-exo

For sugars in nucleic acid structures, C3’-endo [0°, 36°) and C2’-endo [144°, 180°) are predominant. The former corresponds to sugars in ‘canonical’ RNA or A-form DNA, and the latter in sugars of standard B-form DNA. In reality, RNA structures as deposited in the PDB could also contain C2′-endo sugars. One significant example is the GpU dinucleotide platforms, where the 5′-ribose sugar (G) is in the C2′-endo form and the 3′-sugar (U) in the C3′-endo form — see my blog post, titled ‘Is the O2′(G)…O2P H-bond in GpU platforms real?’.

---

Notes:

  • This post is based on my 2011-06-11 blog post with the same title.
  • While visiting Lyon in July 2014, I had the opportunity to hear Dr. Lavery’s opinion on adopting the Westhof & Sundaralingam (1983) sugar-pucker definitions in Curves+. I learned that the new formula are more robust in rare, extreme cases of sugar conformation than the 1972 variants. After all, Dr. Sundaralingam is a co-author on both papers. It is possible that in future releases of DSSR, the new 1983 formula for sugar pucker would become the default.

Comment

---

The DSSR --prefix and --cleanup options

In the DSSR v1.2.7-2015jun09 release, I documented two additional command-line options (--prefix and --cleanup) that are related to the various auxiliary files. As a matter of fact, these two options (among quite a few others) have been there for a long time, but without being explicitly described. The point is not to hide but to simplify — one of the design goals of DSSR is simplicity. DSSR has already possessed numerous key functionality to be appreciated. Before DSSR is firmly established in the RNA bioinformatics field, I beleive too many nonessential “features” could be distracting. While writing and refining the DSSR code, I do feel that some ‘auxiliary’ features could be handy for experienced users (including myself). So along the way, I’ve added many ‘hidden’ options that are either experimental or potentially useful.

On one side, I sense it is acceptable for a scientific software to actually does more than it claims. On the other hand, I have always been quick in addressing users’ requests — as one example, check for the --select option recently introduced into DSSR in response to a user request, and the ‘hidden’ --dbn-break option for specifying the symbol to separate multiple chains or chain breaks in DSSR-derived dot-bracket notation.

Back to --prefix and --cleanup, the purposes of these two closely related options can be best illustrated using the yeast phenylalanine tRNA structure (1ehz) as an example. By default, running x3dna-dssr -i=1ehz.pdb will produce a total of 11 auxiliary files, with names prefixed with dssr-, as shown below:

List of 11 additional files
   1 dssr-stems.pdb -- an ensemble of stems
   2 dssr-helices.pdb -- an ensemble of helices (coaxial stacking)
   3 dssr-pairs.pdb -- an ensemble of base pairs
   4 dssr-multiplets.pdb -- an ensemble of multiplets
   5 dssr-hairpins.pdb -- an ensemble of hairpin loops
   6 dssr-junctions.pdb -- an ensemble of junctions (multi-branch)
   7 dssr-2ndstrs.bpseq -- secondary structure in bpseq format
   8 dssr-2ndstrs.ct -- secondary structure in connect table format
   9 dssr-2ndstrs.dbn -- secondary structure in dot-bracket notation
  10 dssr-torsions.txt -- backbone torsion angles and suite names
  11 dssr-stacks.pdb -- an ensemble of stacks

With ‘fixed’ generic names by default, users can run DSSR in a directory repeatedly without creating too many files. This practice follows that used in the 3DNA suite of programs. However, my experience in supporting 3DNA over the years has shown that users (myself included) may want to explore further some of the files, e.g. ‘dssr-multiplets.pdb’ for displaying the base multiplets (four triplets here). One could easily use command-line (script) to change a generic name to a more appropriate one: e.g., mv dssr-multiplets.pdb 1ehz-multiplets.pdb for 1ehz. A better solution, however, is by introducing a customized prefix to the additional files, and that’s exactly where the --prefix option comes in. The option is specified like this: --prefix=text where text can be any string as appropriate. So running x3dna-dssr -i=1ehz.pdb --prefix=1ehz, for example, will lead to the following output:

List of 11 additional files
   1 1ehz-stems.pdb -- an ensemble of stems
   2 1ehz-helices.pdb -- an ensemble of helices (coaxial stacking)
   3 1ehz-pairs.pdb -- an ensemble of base pairs
   4 1ehz-multiplets.pdb -- an ensemble of multiplets
   5 1ehz-hairpins.pdb -- an ensemble of hairpin loops
   6 1ehz-junctions.pdb -- an ensemble of junctions (multi-branch)
   7 1ehz-2ndstrs.bpseq -- secondary structure in bpseq format
   8 1ehz-2ndstrs.ct -- secondary structure in connect table format
   9 1ehz-2ndstrs.dbn -- secondary structure in dot-bracket notation
  10 1ehz-torsions.txt -- backbone torsion angles and suite names
  11 1ehz-stacks.pdb -- an ensemble of stacks

The --cleanup option, as its name implies, is to tidy up a directory by removing the auxiliary files generated by DSSR. The usage is very simple:

x3dna-dssr --cleanup
x3dna-dssr --cleanup --prefix=1ehz

The former gets rid of the default ‘fixed’ generic auxiliary files (dssr-pairs.pdb etc), whilst the latter deletes prefixed supporting files (1ehz-pairs.pdb etc).

Comment

---

Assignment of HETATM vs. ATOM records for modified nucleotides in PDB vs. PDBx/mmCIF format

Recently, I came across and have been surprised by the different assignment of HETATM vs. ATOM records for modified nucleotides in PDB vs. PDBx/mmCIF format. As always, the issue is best illustrated with a concrete example. Here is what I observed in the PDB entry 1ehz, the crystal structure of yeast phenylalanine tRNA at 1.93 Å resolution.

DSSR identifies 14 modified nucleotides (of 11 types) in 1ehz as shown below:

List of 11 types of 14 modified nucleotides
      nt    count  list
   1 1MA-a    1    A.1MA58
   2 2MG-g    1    A.2MG10
   3 5MC-c    2    A.5MC40,A.5MC49
   4 5MU-t    1    A.5MU54
   5 7MG-g    1    A.7MG46
   6 H2U-u    2    A.H2U16,A.H2U17
   7 M2G-g    1    A.M2G26
   8 OMC-c    1    A.OMC32
   9 OMG-g    1    A.OMG34
  10 PSU-P    2    A.PSU39,A.PSU55
  11 YYG-g    1    A.YYG37

In file 1ehz.pdb downloaded from RCSB PDB, all the 14 modified nucleotides are assigned as HETATM whereas in 1ehz.cif the corresponding records are ATOM. Here is the excerpt for 1MA58 in PDB format:

HETATM 1252  P   1MA A  58      73.770  67.765  34.057  1.00 30.65           P  
HETATM 1253  OP1 1MA A  58      72.638  67.886  33.105  1.00 32.84           O  
HETATM 1254  OP2 1MA A  58      73.621  68.229  35.450  1.00 29.49           O  
HETATM 1255  O5' 1MA A  58      74.315  66.273  34.254  1.00 28.81           O  
HETATM 1256  C5' 1MA A  58      74.592  65.439  33.080  1.00 29.42           C  
HETATM 1257  C4' 1MA A  58      74.279  63.972  33.383  1.00 33.42           C  
HETATM 1258  O4' 1MA A  58      74.880  63.685  34.667  1.00 32.36           O  
HETATM 1259  C3' 1MA A  58      72.789  63.573  33.509  1.00 35.13           C  
HETATM 1260  O3' 1MA A  58      72.625  62.168  33.250  1.00 36.80           O  
HETATM 1261  C2' 1MA A  58      72.560  63.667  35.012  1.00 34.80           C  
HETATM 1262  O2' 1MA A  58      71.525  62.828  35.506  1.00 36.27           O  
HETATM 1263  C1' 1MA A  58      73.908  63.150  35.551  1.00 33.62           C  
HETATM 1264  N9  1MA A  58      74.284  63.494  36.930  1.00 30.36           N  
HETATM 1265  C8  1MA A  58      73.887  64.574  37.688  1.00 34.55           C  
HETATM 1266  N7  1MA A  58      74.415  64.610  38.899  1.00 33.32           N  
HETATM 1267  C5  1MA A  58      75.204  63.469  38.953  1.00 33.37           C  
HETATM 1268  C6  1MA A  58      76.031  62.941  39.948  1.00 33.58           C  
HETATM 1269  N6  1MA A  58      76.184  63.488  41.134  1.00 41.19           N  
HETATM 1270  N1  1MA A  58      76.708  61.803  39.669  1.00 34.48           N  
HETATM 1271  CM1 1MA A  58      77.649  61.222  40.626  1.00 31.43           C  
HETATM 1272  C2  1MA A  58      76.527  61.216  38.479  1.00 28.43           C  
HETATM 1273  N3  1MA A  58      75.793  61.624  37.453  1.00 31.67           N  
HETATM 1274  C4  1MA A  58      75.142  62.771  37.747  1.00 33.02           C  

The corresponding section in PDBx/mmCIF format is:

ATOM   1252 P  P     . 1MA A 1 58 ? 73.770 67.765 34.057  1.00 30.65  ? ? ? ? ? ? 58  1MA A P     1 
ATOM   1253 O  OP1   . 1MA A 1 58 ? 72.638 67.886 33.105  1.00 32.84  ? ? ? ? ? ? 58  1MA A OP1   1 
ATOM   1254 O  OP2   . 1MA A 1 58 ? 73.621 68.229 35.450  1.00 29.49  ? ? ? ? ? ? 58  1MA A OP2   1 
ATOM   1255 O  "O5'" . 1MA A 1 58 ? 74.315 66.273 34.254  1.00 28.81  ? ? ? ? ? ? 58  1MA A "O5'" 1 
ATOM   1256 C  "C5'" . 1MA A 1 58 ? 74.592 65.439 33.080  1.00 29.42  ? ? ? ? ? ? 58  1MA A "C5'" 1 
ATOM   1257 C  "C4'" . 1MA A 1 58 ? 74.279 63.972 33.383  1.00 33.42  ? ? ? ? ? ? 58  1MA A "C4'" 1 
ATOM   1258 O  "O4'" . 1MA A 1 58 ? 74.880 63.685 34.667  1.00 32.36  ? ? ? ? ? ? 58  1MA A "O4'" 1 
ATOM   1259 C  "C3'" . 1MA A 1 58 ? 72.789 63.573 33.509  1.00 35.13  ? ? ? ? ? ? 58  1MA A "C3'" 1 
ATOM   1260 O  "O3'" . 1MA A 1 58 ? 72.625 62.168 33.250  1.00 36.80  ? ? ? ? ? ? 58  1MA A "O3'" 1 
ATOM   1261 C  "C2'" . 1MA A 1 58 ? 72.560 63.667 35.012  1.00 34.80  ? ? ? ? ? ? 58  1MA A "C2'" 1 
ATOM   1262 O  "O2'" . 1MA A 1 58 ? 71.525 62.828 35.506  1.00 36.27  ? ? ? ? ? ? 58  1MA A "O2'" 1 
ATOM   1263 C  "C1'" . 1MA A 1 58 ? 73.908 63.150 35.551  1.00 33.62  ? ? ? ? ? ? 58  1MA A "C1'" 1 
ATOM   1264 N  N9    . 1MA A 1 58 ? 74.284 63.494 36.930  1.00 30.36  ? ? ? ? ? ? 58  1MA A N9    1 
ATOM   1265 C  C8    . 1MA A 1 58 ? 73.887 64.574 37.688  1.00 34.55  ? ? ? ? ? ? 58  1MA A C8    1 
ATOM   1266 N  N7    . 1MA A 1 58 ? 74.415 64.610 38.899  1.00 33.32  ? ? ? ? ? ? 58  1MA A N7    1 
ATOM   1267 C  C5    . 1MA A 1 58 ? 75.204 63.469 38.953  1.00 33.37  ? ? ? ? ? ? 58  1MA A C5    1 
ATOM   1268 C  C6    . 1MA A 1 58 ? 76.031 62.941 39.948  1.00 33.58  ? ? ? ? ? ? 58  1MA A C6    1 
ATOM   1269 N  N6    . 1MA A 1 58 ? 76.184 63.488 41.134  1.00 41.19  ? ? ? ? ? ? 58  1MA A N6    1 
ATOM   1270 N  N1    . 1MA A 1 58 ? 76.708 61.803 39.669  1.00 34.48  ? ? ? ? ? ? 58  1MA A N1    1 
ATOM   1271 C  CM1   . 1MA A 1 58 ? 77.649 61.222 40.626  1.00 31.43  ? ? ? ? ? ? 58  1MA A CM1   1 
ATOM   1272 C  C2    . 1MA A 1 58 ? 76.527 61.216 38.479  1.00 28.43  ? ? ? ? ? ? 58  1MA A C2    1 
ATOM   1273 N  N3    . 1MA A 1 58 ? 75.793 61.624 37.453  1.00 31.67  ? ? ? ? ? ? 58  1MA A N3    1 
ATOM   1274 C  C4    . 1MA A 1 58 ? 75.142 62.771 37.747  1.00 33.02  ? ? ? ? ? ? 58  1MA A C4    1 

While I have not tested exhaustively, it seems true that PDBx/mmCIF has adopted a different definition of what constitutes a HETATM residue. It is worth noting that results from 3DNA and DSSR/SNAP are not effected by the conflicting assignments.

Comment

---

Ambiguous 'analyze' and 'rebuild' program names

From the very beginning, 3DNA contains two key programs, analyze and rebuild, for the analysis and rebuilding of nucleic acid 3D structures. The two names are short and to the point, but with one caveat. They are common verbs that can be easily picked up by other software packages. When 3DNA and such packages are installed on the same machine, naming clashes happen. If the 3DNA bin/ directory is searched afterwards, the analyze or rebuild command may have nothing to do with nucleic acid structures at all. Naturally, this naming ambiguity can lead to confusions and frustrations.

I’ve been aware of the rebuild program name conflict for a long time. Recently, I was surprised by another analyze program on my Mac OS X Yosemite. As shown from the following output, the analyze program seems to be installed via Mac port, and it is about analyzing words in a dictionary file.

~ [540] which analyze
/opt/local/bin/analyze
~ [541] analyze -h
correct syntax is:
analyze affix_file dictionary_file file_of_words_to_check
use two words per line for morphological generation

The ambiguous names are exactly the reason that I use x3dna-dssr and x3dna-snap for the two new programs I’ve been working over the past few years. As for the analyze and rebuild programs in 3DNA v2.x, I’d rather leave them as is. 3DNA is now in wide use in other structural bioinformatic pipelines to allow for easy name changes without causing compatibility issues. On a positive side, once you know the problem, fixing it is straightforward. This post is to raise the awareness of the 3DNA user community about such naming conflicts.

Comment

---

Name of base atoms in PDB formats

Canonical bases (A, C, G, T and U) in nucleic acid structures have standard atom names, shown below using the Watson-Crick A–T and G–C pairs. Ring atoms of adenine, for example, are named (N1, C2, N3, C4, C5, C6, N7, C8, N9) respectively.

Watson-Crick base pairs

Four characters are reserved for atom names in the PDB format. The convention, as seen in files downloaded from the RCSB PDB, is to put the two-character base name in the middle, as in .N1.. Note that here each dot (.) is used for a space character to make it stand out.

Long time ago, I became aware a PDB format variant where the base name is left-aligned, as in N1... This case has ever since been properly handled by 3DNA (including DSSR and SNAP). While checking submitted entries to web-DSSR, I recently noticed yet another PDB format variation in labeling base names with the format of ..N1 (i.e., right-aligned). Without taking this special variant of PDB format into consideration, 3DNA/DSSR reported that “no nucleotides found!” Once the issue is known, however, fixing it is straightforward. As of May 4, 2015, 3DNA v2.2, DSSR and SNAP can all handle this special PDB variant correctly.

Over the years, I have come across many PDB variants claimed to compliant with the loosely defined format. If you find 3DNA or DSSR is not working as expected, it is likely the coordinate file in the self-claimed ‘PDB format’ is at fault. Wherever practical, I’ve tried to incorporate as many non-standard variants as possible.

Comment

---

Nucleic acid structures in the RCSB PDB

The NDB (Nucleic Acid Database) is a valuable resource dedicated to “information about experimentally-determined nucleic acids and complex assemblies.” Over the years, however, I’ve gradually switched from NDB to PDB (Protein Data Bank) for my research on nucleic acid structures. NDB is derived from PDB and presumably should contain all nucleic acid structures available in the PDB. However, at the time of this writing (on April 9, 2015), the NDB says: “As of 8-Apr-2015 number of released structures: 7430” and the PDB states “7611 Nucleic Acid Containing Structures”. So PDB has 7611-7430=181 more entries of nucleic acid structures than the NDB, possibly due to a lag in NDB’s processing of newly released PDB structures. Another issue is the inconsistency of the NDB identifier: early entries have e.g. bdl084 for B-DNA (355d in PDB), but now NDB seems to use the same id as the PDB (e.g., 4p5j).

The RCSB PDB maintains a weekly-updated, summary file named pdb_entry_type.txt in pure text format (check here for a list of useful summary files), containing “List of all PDB entries, identification of each as a protein, nucleic acid, or protein-nucleic acid complex and whether the structure was determined by diffraction or NMR.” An excerpt of the file is shown below:

108m    prot    diffraction
109d    nuc     diffraction
109l    prot    diffraction
109m    prot    diffraction
10gs    prot    diffraction
10mh    prot-nuc        diffraction
110d    nuc     diffraction
110l    prot    diffraction
.................................
102m    prot    diffraction
103d    nuc     NMR

Specifically, a nucleic acid structure contains the (sub)string nuc in the second field, where prot-nuc means a protein-RNA/DNA complex. This text file is trivial to parse, and the atomic coordinates files (in PDB or PDBx/mmCIF format) for all nucleic acid structures can be automatically downloaded from the RCSB PDB using a script.

It is worth noting that DSSR is checked against all nucleic acid structures in the PDB at the time of each release to ensure that it does not crash. I update my local copy of nucleic acid structures each week, and run DSSR on the new entries. This process not only provides me an opportunity to keep pace with new developments in the field but also allows me to keep refining DSSR as needs arise.

Comment

---

Modified pseudouridines

Pseudouridine (5-ribosyluracil, PSU) is the most abundant modified nucleotide in RNA. It is unique in that it has a C-glycosidic bond (C-C1′) instead of the N-glycosidic bond (N-C’) common to all other nucleotides, canonical or modified. In 3DNA, the one-letter code for PSU is assigned to the upper case ‘P’, reserving the lower case ‘p’ for its modified variants. Distinguishing PSU from standard U (or T) is important for deriving sensible base-pair parameters and the χ torsion angle.

Pseudouridine (PSU) 3TD -- N3-methylated PSU
PSU 3TD

Recently, I came across 3TD (see figure above) in PDB entry 5afi. 3TD is a modified variant of PSU, with a methyl group attached to N3. In 3DNA v2.1 v2.1-2015mar11, 3TD is abbreviated to ‘p’ to signify its connection to PSU.

In the list of recognized nucleotides (‘baselist.dat’) distributed with 3DNA, there are two other residues mapped to ‘p’: FHU and P2U (see figure below). As is often the case, it is the chemical structure, not the 3-letter PDB ligand identifier (or even full chemical name), that shows clearly to what 3DNA 1-letter abbreviation a residue matches.

FHU -- 5-fluoro-6-hydroxy-pseudouridine P2U -- 2'-deoxy-pseudouridine
FHU P2U

Comment

---

Exterior loop in RNA secondary structure

A single-stranded RNA molecule can fold back onto itself to form various loops delineated by double helical stems, as shown in the figure below [taken from the Nearest Neighbor Database website from the Turner group].

Various loops in RNA secondary structure

Of special note is the exterior loop (at the bottom) which includes the 5′ and 3′ ends of the sequence. The Mfold User Manual defines the exterior loop as such:

The collection of bases and base pairs not accessible from any base pair is called the exterior (or external) loop … . It is worth noting that if we imagine adding a 0th and an (n + 1)st base to the RNA, and a base pair 0.(n+1), then the exterior loop becomes the loop closed by this imaginary base pair. … The exterior loop exists only in linear RNA.

While each of the other loops (hairpin, bulge, internal or junction) forms a closed ‘circle’ with two neighboring bases connected by either a canonical pair or backbone covalent bond, the ‘exterior loop’ has only an imaginary pair to close the 5′ and 3′ ends of the sequence. Moreover, the two ends of an RNA molecule are not necessarily close in three-dimenional space, as may be implied in the above secondary structure diagram. For example, in the H-type pseudoknotted structure 1ymo from human telomerase RNA, the 5′ and 3′ ends are on the opposite sides.

DSSR does not has the concept of an ‘exterior loop’ due to its lack of a closing pair to form a ‘circle’. Instead, each of the 5′ and 3′ dangling ends is taken as a ‘non-loop single-stranded segment’, if applicable. For the crystal structure of yeast phenylalanine tRNA (1ehz, see the figure at the bottom), the relevant portion of DSSR output is as below. Note that since the 5′ end is paired, only the single-stranded region at the 3′ end is listed. Presumably, the ‘exterior loop’ in this case would also include the G1—C72 pair, with the imaginary closing pair connecting G1 and A76.

List of 1 non-loop single-stranded segment
   1 nts=4 ACCA A.A73,A.C74,A.C75,A.A76

yeast phenylalanine tRNA

Comment

---

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu