The --structure-title option for DSSR .ct output

DSSR produces RNA secondary structures in connect table (.ct) format. According to "RNAstructure Command Line Help: File Formats" (with slight editing):


CT File Format

A CT (Connectivity Table) file contains secondary structure information for a sequence. These files are saved with a CT extension. When entering a structure to calculate the free energy, the following format must be followed.

  1. Start of first line: number of bases in the sequence
  2. End of first line: title of the structure
  3. Each of the following lines provides information about a given base in the sequence. Each base has its own line, with these elements in order:
    • Base number: index n
    • Base (A, C, G, T, U, X)
    • Index n-1
    • Index n+1
    • Number of the base to which n is paired. No pairing is indicated by 0 (zero).
    • Natural numbering. RNAstructure ignores the actual value given in natural numbering, so it is easiest to repeat n here.

Using PDB entry 1msy as an example (see Figure 1 below):


1msy-in-3d-2d

Figure 1. 3D and 2D structures of PDB entry 1msy. (A) 3D schematic auto-created via the DSSR-PyMOL integration. The labeled residues follow PDB coordinates. (B) 2D diagram rendered with VARNA using DSSR-derived 2D structural information in the .ct format. This figure was annotated using Inkscape.


With commands:

x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.ct 1msy-dssr-default.ct

The file 1msy-dssr-default.ct has the following contents:

   27 ENERGY = 0.0 [1msy] -- secondary structure derived by DSSR
    1 U     0     2     0  2647
    2 G     1     3    26  2648
    3 C     2     4    25  2649
    4 U     3     5    24  2650
    5 C     4     6    23  2651
    6 C     5     7    22  2652
    7 U     6     8     0  2653
    8 A     7     9     0  2654
    9 G     8    10     0  2655
   10 U     9    11     0  2656
   11 A    10    12     0  2657
   12 C    11    13    17  2658
   13 G    12    14     0  2659
   14 U    13    15     0  2660
   15 A    14    16     0  2661
   16 A    15    17     0  2662
   17 G    16    18    12  2663
   18 G    17    19     0  2664
   19 A    18    20     0  2665
   20 C    19    21     0  2666
   21 C    20    22     0  2667
   22 G    21    23     6  2668
   23 G    22    24     5  2669
   24 A    23    25     4  2670
   25 G    24    26     3  2671
   26 U    25    27     2  2672
   27 G    26     0     0  2673

Here the first line contains 27 (as the number of bases) and ENERGY = 0.0 [1msy] -- secondary structure derived by DSSR (as the title). While RNAstructure ignores the actual values given in natural numbering, DSSR outputs the residue numbers of the nucleotides (e.g. U2467 and G2673) in the PDB file.

With the DSSR option --structure-title (or --str-title, actually via regex "^-?-?str(ucture)?[-_]?title"), users can set the title for the derived .ct file, as shown below:

x3dna-dssr -I=1msy.pdb --structure-title='CT file derived from DSSR'
cp dssr-2ndstrs.ct 1msy-dssr-title.ct

   27 CT file derived from DSSR
    1 U     0     2     0  2647
    2 G     1     3    26  2648
......
   26 U    25    27     2  2672
   27 G    26     0     0  2673

One can also remove the title, by using an empty string "" (i.e., --str-title="") or simply --str-title (or --str-title=).

x3dna-dssr -I=1msy.pdb --structure-title=""
cp dssr-2ndstrs.ct 1msy-dssr-notitle.ct

   27
    1 U     0     2     0  2647
    2 G     1     3    26  2648
......

With the --more option, DSSR also outputs additional info that can be used to easily identify a nucleotide and its pairing partner.

x3dna-dssr -I=1msy.pdb --more --structure-title="1msy with extra info"
cp dssr-2ndstrs.ct 1msy-dssr-extra.ct

   27 1msy with extra info
    1 U     0     2     0  2647 # name=A.U2647
    2 G     1     3    26  2648 # name=A.G2648, pairedNt=A.U2672
    3 C     2     4    25  2649 # name=A.C2649, pairedNt=A.G2671
......

Note that unlike for the .bpseq format with extra info which cannot be fed directly into VARNA, the extra info for the .ct format causes no troubles for VARNA to visualize the 2d structure.

The --structure-title option is another small feature implemented in DSSR. It is currently not documented in the DSSR User Manual since this feature is unlikely of general interest.


DSSR commands used, and the output .ct files:

x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.ct 1msy-dssr-default.ct

x3dna-dssr -I=1msy.pdb --structure-title='CT file derived from DSSR'
cp dssr-2ndstrs.ct 1msy-dssr-title.ct

x3dna-dssr -I=1msy.pdb --structure-title=""
cp dssr-2ndstrs.ct 1msy-dssr-notitle.ct

x3dna-dssr -I=1msy.pdb --more --structure-title="1msy with extra info"
cp dssr-2ndstrs.ct 1msy-dssr-extra.ct
---

Comment

 
---

·

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu