The `--bpseq` option in DSSR

By default, DSSR produces RNA secondary structures in three commonly used file formats––ViennaRNA package dbn, Mfold connect table (.ct), and CRW bpseq––that can be fed directly into visualization tools such as VARNA. In this blog post, I want to dig deeper into the bpseq format, and show the variations available from DSSR.

According to "RNA STRAND v2.0 - The RNA secondary STRucture and statistical ANalysis Database" (with slight editing):

BPSEQ format:The file name should end with the suffix ".bpseq", as in "mystr.bpseq". The bpseq format is a simple text format in which there is one line per base in the molecule, listing the position of the base (leftmost position is 1), the base name (A,C,G,U, or other alphabetical characters), and the position number of the base to which it is paired, with a 0 denoting that the base is unpaired. For more information, see the Comparative RNA Web Site. An example is as follows:

For complexes with more than one molecule, the molecules are listed in sequence, with the base pairs numbers of each successive molecule following in order from the previous molecule.

The bases in bpseq format are identified by position numbers starting from 1 for the leftmost position. That is the convention DSSR follows by default in its .bpseq output. For example, for the PDB entry 1msy, which contains 27 nucleotides, the command x3dna-dssr -i=1msy.pdb will generate a file named dssr-2ndstrs.bpseq with the following contents (abbreviated):

However, according to PDB atomic coordinates, the nucleotides are numbered from U2647 (#1) to G2673 (#27) as shown in the Figure 1 below:

1msy-in-3d-2d

Figure 1. 3D and 2D structures of PDB entry 1msy. (A) 3D schematic auto-created via the DSSR-PyMOL integration. The labeled residues follow PDB coordinates. (B) 2D diagram rendered with VARNA using DSSR-derived 2D structural information in the .ct format. This figure was annotated using Inkscape.

It makes sense that the labelling of bases in the 2D bpseq format follows those from the 3D atomic coordinates in the PDB. Thus instead of starting from position 1 as shown above, the bpseq file would start with 2647. That's exactly what the DSSR --bpseq option is created for. Thus, with the command x3dna-dssr -i=1msy.pdb --bpseq, the output file dssr-2ndstrs.bpseq now has the following contents (abbreviated):

  2647 U      0
  2648 G   2672
  2649 C   2671
......
  2671 G   2649
  2672 U   2648
  2673 G      0

This .bpseq file can be read by VARNA (tested with VARNAv3-93.jar) to generate a 2D image as shown in Figure 1(B) above.

Moreover, with the command x3dna-dssr -i=1msy.pdb --bpseq=extra, the output file dssr-2ndstrs.bpseq now contains additional info to easily identify a nucleotide and its pairing partner:

  2647 U      0 # name=A.U2647
  2648 G   2672 # name=A.G2648, pairedNt=A.U2672
  2649 C   2671 # name=A.C2649, pairedNt=A.G2671
......
  2671 G   2649 # name=A.G2671, pairedNt=A.C2649
  2672 U   2648 # name=A.U2672, pairedNt=A.G2648
  2673 G      0 # name=A.G2673

It should be noted that this .bpseq output file is no longer compliant to the standard, and can not be fed into VARNA for visualization.

The --bpseq option has been added upon users' request. The --bpseq=extra variation was implemented recently to ensure that the --bpseq option by itself produce a valid .bpseq file without extra info (e.g., enabled with the global --more option). Now the extra info for .bpseq output is enabled only by setting --bpseq=extra explicitly.

This --bpseq option and its evolution is a good example of how DSSR responds to community requests. I'm here to listen and I'm always willing to improve DSSR that better fit users' needs. If you make use of DSSR in your pipeline and need some adaptions, please do not hesitate to contact me. I may consider adding a new option or revising the code otherwise that would streamline the integration of DSSR into your project.

DSSR commands used, and the output .bpseq files:

x3dna-dssr -i=1msy.pdb
cp dssr-2ndstrs.bpseq 1msy-dssr-default.bpseq

x3dna-dssr -i=1msy.pdb --bpseq
cp dssr-2ndstrs.bpseq 1msy-dssr-bpseq.bpseq

x3dna-dssr -i=1msy.pdb --bpseq=extra
cp dssr-2ndstrs.bpseq 1msy-dssr-bpseq-extra.bpseq

Comment

« May's article on "The Best Ways to Study DNA and Protein Interactions" · Water-mediated base pairs »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

Name	Remember
Email
Website
Message	Textile help

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

The `--bpseq` option in DSSR

Comment

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)