DSSR produces RNA secondary structures in connect table (.ct) format. According to "RNAstructure Command Line Help: File Formats" (with slight editing):
CT File Format
A CT (Connectivity Table) file contains secondary structure information for a sequence. These files are saved with a CT extension. When entering a structure to calculate the free energy, the following format must be followed.
- Start of first line: number of bases in the sequence
- End of first line: title of the structure
- Each of the following lines provides information about a given base in the sequence. Each base has its own line, with these elements in order:
- Base number: index n
- Base (A, C, G, T, U, X)
- Index n-1
- Index n+1
- Number of the base to which n is paired. No pairing is indicated by 0 (zero).
- Natural numbering. RNAstructure ignores the actual value given in natural numbering, so it is easiest to repeat n here.
Using PDB entry 1msy as an example (see Figure 1 below):
Figure 1. 3D and 2D structures of PDB entry 1msy. (A) 3D schematic auto-created via the DSSR-PyMOL integration. The labeled residues follow PDB coordinates. (B) 2D diagram rendered with VARNA using DSSR-derived 2D structural information in the .ct format. This figure was annotated using Inkscape.
With commands:
x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.ct 1msy-dssr-default.ct
The file 1msy-dssr-default.ct has the following contents:
27 ENERGY = 0.0 [1msy] -- secondary structure derived by DSSR
1 U 0 2 0 2647
2 G 1 3 26 2648
3 C 2 4 25 2649
4 U 3 5 24 2650
5 C 4 6 23 2651
6 C 5 7 22 2652
7 U 6 8 0 2653
8 A 7 9 0 2654
9 G 8 10 0 2655
10 U 9 11 0 2656
11 A 10 12 0 2657
12 C 11 13 17 2658
13 G 12 14 0 2659
14 U 13 15 0 2660
15 A 14 16 0 2661
16 A 15 17 0 2662
17 G 16 18 12 2663
18 G 17 19 0 2664
19 A 18 20 0 2665
20 C 19 21 0 2666
21 C 20 22 0 2667
22 G 21 23 6 2668
23 G 22 24 5 2669
24 A 23 25 4 2670
25 G 24 26 3 2671
26 U 25 27 2 2672
27 G 26 0 0 2673
Here the first line contains 27 (as the number of bases) and ENERGY = 0.0 [1msy] -- secondary structure derived by DSSR (as the title). While RNAstructure ignores the actual values given in natural numbering, DSSR outputs the residue numbers of the nucleotides (e.g. U2467 and G2673) in the PDB file.
With the DSSR option --structure-title
(or --str-title
, actually via regex "^-?-?str(ucture)?[-_]?title"
), users can set the title for the derived .ct file, as shown below:
x3dna-dssr -I=1msy.pdb --structure-title='CT file derived from DSSR'
cp dssr-2ndstrs.ct 1msy-dssr-title.ct
27 CT file derived from DSSR
1 U 0 2 0 2647
2 G 1 3 26 2648
......
26 U 25 27 2 2672
27 G 26 0 0 2673
One can also remove the title, by using an empty string "" (i.e., --str-title=""
) or simply --str-title
(or --str-title=
).
x3dna-dssr -I=1msy.pdb --structure-title=""
cp dssr-2ndstrs.ct 1msy-dssr-notitle.ct
27
1 U 0 2 0 2647
2 G 1 3 26 2648
......
With the --more
option, DSSR also outputs additional info that can be used to easily identify a nucleotide and its pairing partner.
x3dna-dssr -I=1msy.pdb --more --structure-title="1msy with extra info"
cp dssr-2ndstrs.ct 1msy-dssr-extra.ct
27 1msy with extra info
1 U 0 2 0 2647 # name=A.U2647
2 G 1 3 26 2648 # name=A.G2648, pairedNt=A.U2672
3 C 2 4 25 2649 # name=A.C2649, pairedNt=A.G2671
......
Note that unlike for the .bpseq
format with extra info which cannot be fed directly into VARNA, the extra info for the .ct
format causes no troubles for VARNA to visualize the 2d structure.
The --structure-title
option is another small feature implemented in DSSR. It is currently not documented in the DSSR User Manual since this feature is unlikely of general interest.
DSSR commands used, and the output .ct files:
x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.ct 1msy-dssr-default.ct
x3dna-dssr -I=1msy.pdb --structure-title='CT file derived from DSSR'
cp dssr-2ndstrs.ct 1msy-dssr-title.ct
x3dna-dssr -I=1msy.pdb --structure-title=""
cp dssr-2ndstrs.ct 1msy-dssr-notitle.ct
x3dna-dssr -I=1msy.pdb --more --structure-title="1msy with extra info"
cp dssr-2ndstrs.ct 1msy-dssr-extra.ct