It gives me great pleasure to announce that the 3DNA/DSSR project is now funded by the NIH R24GM153869 grant, titled "X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids". I am deeply grateful for the opportunity to continue working on a project that has basically defined who I am. It was a tough time during the funding gap over the past few years. Nevertheless, I have experienced and learned a lot, and witnessed miracles enabled by enthusiastic users.

Since late 2020 when I lost my R01 grant, DSSR has been licensed by the Columbia Technology Ventures (CTV). I appreciate the numerous users (including big pharma) who purchased a DSSR Pro License or a DSSR Basic paid License. Thanks to the NIH R24GM153869 grant, we are pleased to provide DSSR Basic free of charge to the academic community. Academic Users may submit a license request for DSSR Basic or DSSR Pro by clicking "Express Licensing" on the CTV landing page. Commercial users may inquire about pricing and licensing terms by emailing techtransfer@columbia.edu, copying xiangjun@x3dna.org.

The current version of DSSR is v2.4.5-2024sep24 which contains miscellaneous bug fixes (e.g., chain id with > 4 chars) and minor improvements. This release synchronizes with the new R24 funding, which will bring the project to the next level. All existing users are encouraged to upgrade their installation.

Lots of exciting things will happen for the project. The first thing is to make DSSR freely accessible to the academic community. In the past couple of weeks, CTV have already issued quite a few DSSR Basic Academic licenses to users from all over the world. So the demand is high, and it will become stronger as more academic users become aware of DSSR. I'm closely monitoring the 3DNA Forum, and is always ready to answer users questions.

I am committed to making DSSR a brand that stands for quality and value. By virtue of its unmatched functionality, usability, and support, DSSR saves users a substantial amount of time and effort when compared to other options. My track record throughout the years has unambiguously demonstrated my dedication to this solid software product.


DSSR Basic contains all features described in the three DSSR-related papers, and includes the originally separate SNAP program (still unpublished) for analyzing DNA/RNA-protein complexes. The Pro version integrates the classic 3DNA functionality, plus advanced modeling routines, with email/Zoom/phone support.

---

Single- and double-stranded Zp

From early on, 3DNA calculates the Zp parameter to separate A- and B-DNA double helical steps. First introduced in the paper A-form conformational motifs in ligand-bound DNA structures (see figure below), Zp is the mean projection of the two phosphorus atoms onto the z-axis of the dimer ‘middle frame’. Zp is greater than 1.5 Å for A-DNA, and it is less than 0.5 Å for B-DNA. As noted in the 3DNA NAR paper, other parameters such as slide should also be examined to confirm conformational assignments based on Zp.

definition of the Zp parameter for duplex DNA

As of v2.1, 3DNA has introduced the single-stranded variant for the Zp parameter (ssZp) as a more robust substitute for the Richardson phosphorus-glycosidic bond distance parameter (Dp) to characterize sugar puckers. See post Sugar pucker correlates with phosphorus-base distance for more details. In 3DNA/DSSR, ssZp is defined as the z-coordinate of the 3′ phosphorus atom expressed in the standard reference frame of the preceding base; it is positive when phosphorus lies on the +z-axis side (base in anti conformation) and negative if phosphorus is on the –z-axis side (base in syn conformation). Note that by definition, Dp should always be positive.

As in the previous post, here I am using G175 and U176 of PDB entry 1jj2 (the large ribosomal subunit of Haloarcula marismortui) as examples to illustrate how the ssZp parameters are calculated. The GpU forms a dinucleotide platform, where the sugar of G175 adopts a C2′-endo conformation, and that of U176 C3′-endo. For verification, here is the PDB data file for fragment 1jj2-G175-U176-A177.pdb (note A177 is included for its phosphorus atom). Run the following 3DNA commands:

find_pair -s 1jj2-G175-U176-A177.pdb stdout
frame_mol -1 ref_frames.dat 1jj2-G175-U176-A177.pdb ref-G175.pdb
frame_mol -2 ref_frames.dat 1jj2-G175-U176-A177.pdb ref-U176.pdb

File ref-G175.pdb contains the following line:

ATOM     24  P     U 0 176      -5.624   6.937   1.918  1.00 24.19           P 

The z-coordinate of U176 (which is 3′ to G175) is 1.918, which is the ssZp for G175. It is less than 2.9 Å, corresponding to the C2′-endo sugar conformation of G175.

Similarly, file ref-U176.pdb contains the following line:

ATOM     44  P     A 0 177      -3.841   6.592   4.377  1.00 25.91           P

So the ssZp for U176 is 4.377, which is greater than 2.9 Å, corresponding to the C3′-endo sugar conformation of U176.

To sum up, the double-stranded Zp as originally available from 3DNA can be used for discriminating A- and B-DNA double-helical steps: Zp > 1.5 Å for A-DNA, and Zp < 0.5 Å for B-DNA. The newly introduced single-stranded Zp is intended for characterizing sugar puckers: Zp > 2.9 Å for C3′-endo, and Zp < 2.9 Å for C2′-endo. Since A-DNA has predominately C3′-endo sugar conformation and B-DNA has C2′-endo sugar, the ssZp parameter would be helpful in classifying a dinucleotide into A- or B-like conformation. A survey of ssZp in well-defined A- and B-DNA structures (as performed for double-stranded Zp) should prove useful.

Realizing the naming confusions of double-stranded Zp vs single-stranded Zp, I am considering to rename single-stranded Zp as ssZp in future releases of 3DNA and DSSR. Do you have any comments or suggestions? Please let me know by leaving a comment!

Comment

---

Weird cases of nucleotides with missing atoms

Recently I was surprised by some cases of nucleotides with missing atoms in PDB entry 1pns. The story started like this: 3DNA/DSSR maps various nucleotide names to one-letter codes, based on the data file baselist.dat (see post Modified nucleotides in the PDB). In the meantime, 3DNA/DSSR internally assigns a nucleotide as either purine or pyrimidine, by virtue of coordinates of base atoms. Be definition, purines should only include A/a/G/g/I/i, and pyrimidines C/c/T/t/U/u/P/p. However, no consistency check has been implemented in DSSR until just now.

I first noticed the inconsistency between residue name and atom coordinates for nucleotide A6 on chain U (hereafter referred to as U.A6) in 1pns. The nucleotide has standard name ‘  A’, obviously a purine. However, somehow DSSR classified it as a pyrimidine based on atomic coordinates. Upon further check of the PDB data file, I found the following remarks:

REMARK 470 MISSING ATOM                                                         
REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS(M=MODEL NUMBER;            
REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER;          
REMARK 470 I=INSERTION CODE):                                                   
REMARK 470   M RES CSSEQI  ATOMS                                                
REMARK 470       A U   6    N9   C8   N7                                        
REMARK 470       G U   8    N9   C8   N7                                        
REMARK 470       A U  12    N9   C8   N7                                        
REMARK 470       A U  13    N9   C8   N7                                        
REMARK 470       A U  14    N9   C8   N7                                        

The atomic coordinates for U.A6 are as below:

ATOM  34447  P     A U   6      81.861  37.210  78.651  1.00378.87           P  
ATOM  34448  OP1   A U   6      80.631  37.121  77.831  1.00378.87           O  
ATOM  34449  OP2   A U   6      81.665  37.221  80.119  1.00378.87           O  
ATOM  34450  O5'   A U   6      82.707  38.495  78.212  1.00378.87           O  
ATOM  34451  C5'   A U   6      83.948  38.777  78.887  1.00378.87           C  
ATOM  34452  C4'   A U   6      84.600  40.000  78.276  1.00378.87           C  
ATOM  34453  O4'   A U   6      84.975  39.698  76.901  1.00378.87           O  
ATOM  34454  C3'   A U   6      83.714  41.239  78.153  1.00378.87           C  
ATOM  34455  O3'   A U   6      83.654  41.968  79.369  1.00378.87           O  
ATOM  34456  C2'   A U   6      84.403  42.015  77.020  1.00378.87           C  
ATOM  34457  O2'   A U   6      85.564  42.655  77.474  1.00378.87           O  
ATOM  34458  C1'   A U   6      84.834  40.864  76.105  1.00378.87           C  
ATOM  34459  C5    A U   6      82.033  39.296  74.209  1.00378.87           C  
ATOM  34460  C6    A U   6      82.941  39.553  75.166  1.00378.87           C  
ATOM  34461  N6    A U   6      81.170  39.949  72.090  1.00378.87           N  
ATOM  34462  N1    A U   6      83.830  40.588  75.041  1.00378.87           N  
ATOM  34463  C2    A U   6      83.843  41.410  73.939  1.00378.87           C  
ATOM  34464  N3    A U   6      82.899  41.124  72.974  1.00378.87           N  
ATOM  34465  C4    A U   6      81.968  40.108  73.016  1.00378.87           C  

No atom records for N7, C8 and N9. So far, so good. However, surprise came when I visualized U.A6 in Jmol, as shown in the following image. Note here atom N1 is connected to C1’ as in pyrimidines, and N6 is bonded to C4!

Weird U.A6 with missing atoms (1pns)

The same issue also exists for U.G8 (see figure below), U.A12, U.A13, and U.A14.

Weird U.G8 with missing atoms (1pns)

It is beyond my imagination to understand why such weird cases exist in the PDB, even given the lousy resolution (8.7 Å) of 1pns.

Comment

---

3DNA/DSSR runs just fine under Mac OS X Mavericks

I recently upgraded my Macs to OS X Mavericks to check if 3DNA/DSSR works in the new operating system. I am glad to report that both run without a hitch, as expected.

Since OS X Mavericks is free from the Mac App Store, it will quickly become the de facto version virtually all Mac users would use. I also noticed that Ruby on Mavericks has been upgraded to ruby 2.0.0p247 (2013-06-27 revision 41674), a major step forward from the now retiring Ruby 1.8.7 distributed in previous versions of Mac OS X.

As a rule, I’d ensure that 3DNA/DSSR executes properly in major releases of the commonly used operating systems — Mac, Windows, and Linux.

Comment

---

DSSR works perfectly under DOS (in native Windows)

While having not used DOS for ages, I am glad to find that the DSSR version compiled for MinGW/MSYS on Windows works perfectly under this operating system (see screenshot below). The DSSR DOS command-line interface functions exactly the same as for Linux, Mac OS X, MinGW/MSYS, and CygWin. Among other possible usages, it allows for batch files to take advantage of DSSR.

Screenshot of a DSSR run in DOS

Implementing DSSR in strict ANSI C as a self-contained and zero-dependent command-line program pays off enormously: it simplifies code maintenance and ensures that the program is applicable wherever a C compiler exists. The easy web interface to DSSR makes the program universally accessible.

Comment

---

DSSR command-line processing

Aside from its extensive functionality for RNA structural analyses, DSSR also introduces a consistent and flexible way to process command-line options. Here, each option can be specified via a --key[=value] pair (or -key[=value] or key[=value]; i.e., two/one/zero preceding dashes are all accepted), key can be in either lower, UPPER or MiXed case, and value is optional for Boolean switches. Furthermore, options can be put in any order; if the same key is repeated more than once, the value specified last overwrites corresponding previous settings.

As always, the rules are best illustrated with concrete examples. Some typical use-cases are given below:

#1 analyze PDB entry '1msy', with default output to stdout
x3dna-dssr --input=1msy.pdb

#2 same as #1, with output directed to file '1msy.out'
x3dna-dssr --input=1msy.pdb --output=1msy.out

#3-6, same as #2
x3dna-dssr --output=1msy.out --input=1msy.pdb
x3dna-dssr --OUTPUT=1msy.out --Input=1msy.pdb
x3dna-dssr -output=1msy.out input=1msy.pdb
x3dna-dssr output=1msy.out --input=1msy.pdb

#7 the value '1ehz.pdb' overwrites '1msy.pdb'
x3dna-dssr --input=1msy.pdb input=1ehz.pdb

#8-12 with the switch --more set to true
x3dna-dssr -input=1msy.pdb --more
x3dna-dssr -input=1msy.pdb --more=true
x3dna-dssr -input=1msy.pdb --more=yes
x3dna-dssr -input=1msy.pdb --more=on
x3dna-dssr -input=1msy.pdb --more=1

#13 same as without specifying --more,
#      or with values set to false/no/0
x3dna-dssr -input=1msy.pdb --more=off

#14 shorthand forms for --input and --output
x3dna-dssr -i=1msy.pdb -o=1msy.out

#15 it can also be more verbose
x3dna-dssr --input-pdb-file=1msy.pdb

#16-18 within a key, separator dash(-) and underscore (_)
#      are treated the same, and can be omitted
x3dna-dssr -i=1msy.pdb -non-pair
x3dna-dssr -i=1msy.pdb -non_pair
x3dna-dssr -i=1msy.pdb -nonpair

By allowing for 2/1/0 dashes to precede each key and a dash/underscore character or none to separate words within the key, DSSR provides users with great flexibility in specifying command-line options to fit into their preferred styles. Not surprisingly, new programs to be added into 3DNA, or the version 3 release of the software will all follow the same convention.

Comment

---

Modified nucleotides in the PDB

In addition to the five canonical bases (A, C, G, T, and U), nucleic acid structures in the PDB contains numerous modified variants (natural or engineered) in the nucleobase, sugar, or the phosphate. For instance, the 76-nt (nucleotide) long yeast phenylalanine tRNA (1ehz) contains 14 modified bases: 2MG10, H2U16, H2U17, M2G26, OMC32, OMG34, YYG37, PSU39, 5MC40, 7MG46, 5MC49, 5MU54, PSU55, and 1MA58. Among which, the most prevalent and best-known example is pseudouridine. Note that in the PDB, each residue (including modified nt) is named with an up to three-letter identifier, e.g., PSU for pseudouridine. For a comprehensive list (with chemical and structural information) of small molecules, including modified nts, please refer to the Ligand Expo website hosted by the RCSB PDB.

Given the widespread occurrences of modified bases in nucleic acid structures, any practical structural bioinformatics software should be able to treat them effectively, as with the canonical bases. In 3DNA, from the very beginning, modified bases are mapped to standard counterparts, e.g. 5‐iodouracil (5IU) to uracil (U) and 1‐methyladenine (1MA) to adenine (A), allowing for easy analysis of unusual DNA and RNA structures (see the NAR03 reference). Specifically, in the 3DNA distribution the file baselist.dat contains the mappings explicitly.

As of v2.1, 3DNA automatically maps a new modified base not available in the file baselist.dat. Yet, I have continuously updated the list in line with new DNA/RNA entries released by the PDB. The process is automated with a Ruby script which calls find_pair -s on each nucleic-acid-containing structure to output unknown bases. As an extreme, the baselist.dat file below comprises only canonical bases:

  A   A
  C   C
  G   G
  T   T
  U   U
 DA   A
 DC   C
 DG   G
 DT   T

With the above minimum mapping list, running the command find_pair -s on 1ehz.pdb identifies all the 14 modified bases. A sample case for 2MG is shown below:

Match '2MG' to 'g' for residue 2MG   10  on chain A [#10]
    check it & consider to add line '2MG     g' to file <baselist.dat>

By parsing the output of a batch run on all DNA/RNA-containing entries in the PDB as of October 18, 2013, I identified a total of 596 modified bases. The top portion is as below:

02I     a
08Q     c
08T     a
0AD     g
 0C     c
0DC     c
0DG     g
0DT     t
 0G     g
0KL     u
0KX     c
0KZ     t

An explicit list of base mapping makes the correspondence transparent, and helps avoid ambiguous cases as to which canonical base a modified nt matches to. DSSR uses the same list internally. Hopefully, the information would also be useful to other related projects.

Comment [2]

---

Different names for the methyl group in DNA and RNA structures

Recently I was a bit surprised to find that the methyl group is named differently in the PDB: C7 in DT8 (thymine) of B-DNA 355d, CM5 in 5MC40 (5-methylated C) of tRNA 1ehz, and C5M in 5MU54 (5-methylated U, i.e., T) of the same tRNA 1ehz. See the three figures below for details.

I know that the previously named C5M of thymine in DNA has been renamed C7 as a result of the 2007 remediation effort (PDB v3). However, browsing through the wwPDB Remediation website and reading carefully the article Remediation of the protein data bank archive, I failed to see explanations of the obvious inconsistency of CM5 (5MC40) vs C5M (5MU54) in the nomenclature of the 5-methyl group in the same tRNA entry 1ehz, except for the following note:

As with the Chemical Component Dictionary, names for standard amino acids and nucleotides follow IUPAC recommendations (10) with the exception of the well-established convention for C-terminal atoms OXT and HXT. These nomenclature changes have been applied to standard polymeric chemical components only.

5-methyl is named C7 in DT8 of the DNA entry 355d

5-methyl in DT8 is named C7 in DNA (355d)

5-methyl is named CM5 in 5MC40 of the RNA entry 1ehz

5-methyl in 5MC40 is named CM5 in RNA (1ehz)

5-methyl is named C5M in 5MU54 of the RNA entry 1ehz

5-methyl in 5MU54 named C5M in RNA (1ehz)

Am I missing something obvious? If you have any further information, please leave a comment. Whatever the case, it helps (at least won’t hurt) to know the naming discrepancy for those who care about the small methyl group in nucleic acid structures.

Comment

---

Compiling ViennaRNA on Mac OS X

Recently, I upgraded my local ViennaRNA package installation from v2.0.7 to v2.1.3 on my Mac. Following Quickstart in the INSTALL file, I ran ./configure successfully, but make aborted with error messages. Since I previously had a working copy of the software, it must be configuration issues when I compiled this new version. After a few iterations of checking the error message and reading through the INSTALL file, I came up with the following settings:

./configure --disable-openmp --without-perl
make
sudo make install

Apart from some warning messages, the above make command ran successfully.

This post serves mainly as a note for my own reference. Hopefully, the information may prove useful to others who try to install the versatile ViennaRNA package on a Mac OS X machine.

Comment

---

Web-interface to DSSR

I’ve come up with a preliminary web-interface to DSSR, currently accessible at URL http://web.x3dna.org/dssr. The DSSR web-interface has been tested on Safari, Firefox, Chrome, and IE, with satisfying results. A screenshot of the home page is given below, using 1msy as an example:

Screenshot of the Web-DSSR homepage

After clicking the Submit button, users will be presented with the result page of a DSSR run. The beginning portion of the above example is as follows:

Screenshot of a DSSR-run

Note that the DSSR web-interface is being provided via a shared web hosting service, thus it has limited resources. Specifically, the uploaded file cannot be larger than two megabytes (2MB), and the process could be slow. Additionally, the file must have an extension of .pdb or .cif. To take full advantage of what DSSR has to offer, please install and run the software locally.

By design, DSSR is self-contained, command-line driven, with zero dependance on third-party libraries. Such features make it straightforward to build a GUI- or web-interface to DSSR, or integrate the program into other structural bioinformatics tools. As the need arises, I will refine the DSSR web-interface to better serve the community. The current simple, yet exploratory, web interface should make DSSR accessible to a much wider audience.

Comment

---

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu