By default, DSSR produces RNA secondary structures in three commonly used file formats––ViennaRNA package dbn, Mfold connect table (.ct), and CRW bpseq––that can be fed directly into visualization tools such as VARNA. In this blog post, I want to dig deeper into the bpseq format, and show the variations available from DSSR.
According to "RNA STRAND v2.0 - The RNA secondary STRucture and statistical ANalysis Database" (with slight editing):
BPSEQ format:The file name should end with the suffix
".bpseq"
, as in"mystr.bpseq"
. The bpseq format is a simple text format in which there is one line per base in the molecule, listing the position of the base (leftmost position is 1), the base name (A,C,G,U, or other alphabetical characters), and the position number of the base to which it is paired, with a 0 denoting that the base is unpaired. For more information, see the Comparative RNA Web Site. An example is as follows:
1 G 8
2 G 7
3 C 0
4 A 0
5 U 0
6 U 0
7 C 2
8 C 1
For complexes with more than one molecule, the molecules are listed in sequence, with the base pairs numbers of each successive molecule following in order from the previous molecule.
The bases in bpseq
format are identified by position numbers starting from 1 for the leftmost position. That is the convention DSSR follows by default in its .bpseq
output. For example, for the PDB entry 1msy, which contains 27 nucleotides, the command x3dna-dssr -i=1msy.pdb
will generate a file named dssr-2ndstrs.bpseq
with the following contents (abbreviated):
1 U 0
2 G 26
3 C 25
......
25 G 3
26 U 2
27 G 0
However, according to PDB atomic coordinates, the nucleotides are numbered from U2647 (#1) to G2673 (#27) as shown in the Figure 1 below:
Figure 1. 3D and 2D structures of PDB entry 1msy. (A) 3D schematic auto-created via the DSSR-PyMOL integration. The labeled residues follow PDB coordinates. (B) 2D diagram rendered with VARNA using DSSR-derived 2D structural information in the .ct format. This figure was annotated using Inkscape.
It makes sense that the labelling of bases in the 2D bpseq
format follows those from the 3D atomic coordinates in the PDB. Thus instead of starting from position 1
as shown above, the bpseq file would start with 2647
. That's exactly what the DSSR --bpseq
option is created for. Thus, with the command x3dna-dssr -i=1msy.pdb --bpseq
, the output file dssr-2ndstrs.bpseq
now has the following contents (abbreviated):
2647 U 0
2648 G 2672
2649 C 2671
......
2671 G 2649
2672 U 2648
2673 G 0
This .bpseq file can be read by VARNA (tested with VARNAv3-93.jar
) to generate a 2D image as shown in Figure 1(B) above.
Moreover, with the command x3dna-dssr -i=1msy.pdb --bpseq=extra
, the output file dssr-2ndstrs.bpseq
now contains additional info to easily identify a nucleotide and its pairing partner:
2647 U 0 # name=A.U2647
2648 G 2672 # name=A.G2648, pairedNt=A.U2672
2649 C 2671 # name=A.C2649, pairedNt=A.G2671
......
2671 G 2649 # name=A.G2671, pairedNt=A.C2649
2672 U 2648 # name=A.U2672, pairedNt=A.G2648
2673 G 0 # name=A.G2673
It should be noted that this .bpseq output file is no longer compliant to the standard, and can not be fed into VARNA for visualization.
The --bpseq
option has been added upon users' request. The --bpseq=extra
variation was implemented recently to ensure that the --bpseq
option by itself produce a valid .bpseq file without extra info (e.g., enabled with the global --more
option). Now the extra info for .bpseq output is enabled only by setting --bpseq=extra
explicitly.
This --bpseq
option and its evolution is a good example of how DSSR responds to community requests. I'm here to listen and I'm always willing to improve DSSR that better fit users' needs. If you make use of DSSR in your pipeline and need some adaptions, please do not hesitate to contact me. I may consider adding a new option or revising the code otherwise that would streamline the integration of DSSR into your project.
DSSR commands used, and the output .bpseq files:
x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.bpseq 1msy-dssr-default.bpseq
x3dna-dssr -i=1msy.pdb --bpseq
cp dssr-2ndstrs.bpseq 1msy-dssr-bpseq.bpseq
x3dna-dssr -i=1msy.pdb --bpseq=extra
cp dssr-2ndstrs.bpseq 1msy-dssr-bpseq-extra.bpseq