3DNA Homepage -- Nucleic Acid Structures

DSSR works perfectly under DOS (in native Windows)

While having not used DOS for ages, I am glad to find that the DSSR version compiled for MinGW/MSYS on Windows works perfectly under this operating system (see screenshot below). The DSSR DOS command-line interface functions exactly the same as for Linux, Mac OS X, MinGW/MSYS, and CygWin. Among other possible usages, it allows for batch files to take advantage of DSSR.

Implementing DSSR in strict ANSI C as a self-contained and zero-dependent command-line program pays off enormously: it simplifies code maintenance and ensures that the program is applicable wherever a C compiler exists. The easy web interface to DSSR makes the program universally accessible.

Comment

DSSR command-line processing

Aside from its extensive functionality for RNA structural analyses, DSSR also introduces a consistent and flexible way to process command-line options. Here, each option can be specified via a --key[=value] pair (or -key[=value] or key[=value]; i.e., two/one/zero preceding dashes are all accepted), key can be in either lower, UPPER or MiXed case, and value is optional for Boolean switches. Furthermore, options can be put in any order; if the same key is repeated more than once, the value specified last overwrites corresponding previous settings.

As always, the rules are best illustrated with concrete examples. Some typical use-cases are given below:

#1 analyze PDB entry '1msy', with default output to stdout
x3dna-dssr --input=1msy.pdb

#2 same as #1, with output directed to file '1msy.out'
x3dna-dssr --input=1msy.pdb --output=1msy.out

#3-6, same as #2
x3dna-dssr --output=1msy.out --input=1msy.pdb
x3dna-dssr --OUTPUT=1msy.out --Input=1msy.pdb
x3dna-dssr -output=1msy.out input=1msy.pdb
x3dna-dssr output=1msy.out --input=1msy.pdb

#7 the value '1ehz.pdb' overwrites '1msy.pdb'
x3dna-dssr --input=1msy.pdb input=1ehz.pdb

#8-12 with the switch --more set to true
x3dna-dssr -input=1msy.pdb --more
x3dna-dssr -input=1msy.pdb --more=true
x3dna-dssr -input=1msy.pdb --more=yes
x3dna-dssr -input=1msy.pdb --more=on
x3dna-dssr -input=1msy.pdb --more=1

#13 same as without specifying --more,
#      or with values set to false/no/0
x3dna-dssr -input=1msy.pdb --more=off

#14 shorthand forms for --input and --output
x3dna-dssr -i=1msy.pdb -o=1msy.out

#15 it can also be more verbose
x3dna-dssr --input-pdb-file=1msy.pdb

#16-18 within a key, separator dash(-) and underscore (_)
#      are treated the same, and can be omitted
x3dna-dssr -i=1msy.pdb -non-pair
x3dna-dssr -i=1msy.pdb -non_pair
x3dna-dssr -i=1msy.pdb -nonpair

By allowing for 2/1/0 dashes to precede each key and a dash/underscore character or none to separate words within the key, DSSR provides users with great flexibility in specifying command-line options to fit into their preferred styles. Not surprisingly, new programs to be added into 3DNA, or the version 3 release of the software will all follow the same convention.

Comment

Different names for the methyl group in DNA and RNA structures

Recently I was a bit surprised to find that the methyl group is named differently in the PDB: C7 in DT8 (thymine) of B-DNA 355d, CM5 in 5MC40 (5-methylated C) of tRNA 1ehz, and C5M in 5MU54 (5-methylated U, i.e., T) of the same tRNA 1ehz. See the three figures below for details.

I know that the previously named C5M of thymine in DNA has been renamed C7 as a result of the 2007 remediation effort (PDB v3). However, browsing through the wwPDB Remediation website and reading carefully the article Remediation of the protein data bank archive, I failed to see explanations of the obvious inconsistency of CM5 (5MC40) vs C5M (5MU54) in the nomenclature of the 5-methyl group in the same tRNA entry 1ehz, except for the following note:

As with the Chemical Component Dictionary, names for standard amino acids and nucleotides follow IUPAC recommendations (10) with the exception of the well-established convention for C-terminal atoms OXT and HXT. These nomenclature changes have been applied to standard polymeric chemical components only.

5-methyl is named C7 in DT8 of the DNA entry 355d

5-methyl in DT8 is named C7 in DNA (355d)

5-methyl is named CM5 in 5MC40 of the RNA entry 1ehz

5-methyl in 5MC40 is named CM5 in RNA (1ehz)

5-methyl is named C5M in 5MU54 of the RNA entry 1ehz

5-methyl in 5MU54 named C5M in RNA (1ehz)

Am I missing something obvious? If you have any further information, please leave a comment. Whatever the case, it helps (at least won’t hurt) to know the naming discrepancy for those who care about the small methyl group in nucleic acid structures.

Comment

Compiling ViennaRNA on Mac OS X

Recently, I upgraded my local ViennaRNA package installation from v2.0.7 to v2.1.3 on my Mac. Following Quickstart in the INSTALL file, I ran ./configure successfully, but make aborted with error messages. Since I previously had a working copy of the software, it must be configuration issues when I compiled this new version. After a few iterations of checking the error message and reading through the INSTALL file, I came up with the following settings:

./configure --disable-openmp --without-perl
make
sudo make install

Apart from some warning messages, the above make command ran successfully.

This post serves mainly as a note for my own reference. Hopefully, the information may prove useful to others who try to install the versatile ViennaRNA package on a Mac OS X machine.

Comment

Web-interface to DSSR

I’ve come up with a preliminary web-interface to DSSR, currently accessible at URL http://web.x3dna.org/dssr. The DSSR web-interface has been tested on Safari, Firefox, Chrome, and IE, with satisfying results. A screenshot of the home page is given below, using 1msy as an example:

After clicking the Submit button, users will be presented with the result page of a DSSR run. The beginning portion of the above example is as follows:

Note that the DSSR web-interface is being provided via a shared web hosting service, thus it has limited resources. Specifically, the uploaded file cannot be larger than two megabytes (2MB), and the process could be slow. Additionally, the file must have an extension of .pdb or .cif. To take full advantage of what DSSR has to offer, please install and run the software locally.

By design, DSSR is self-contained, command-line driven, with zero dependance on third-party libraries. Such features make it straightforward to build a GUI- or web-interface to DSSR, or integrate the program into other structural bioinformatics tools. As the need arises, I will refine the DSSR web-interface to better serve the community. The current simple, yet exploratory, web interface should make DSSR accessible to a much wider audience.

Comment

3DNA forum registrations pass 1000

As of June 24, 2013, the number of 3DNA Forum registrations has passed the 1000 mark. On September 16, 2012, I wrote the post The number of 3DNA forum registrations has reached 500. Thus, in slightly over 9 months, the number has doubled, with approximately 2 registrations per day.

I am glad to see the steady increase of the 3DNA user base. Over the time, I have strived to be responsive to user questions, and made every effort to keep the forum spam free. By and large, employing simple 3DNA-related questions has turned out to be an effective anti-spam strategy. Since the launch of the new forum.x3dna.org in March 2012, I’ve received less than five requests (to the best of my memory) asking for help on registrations. As a recently example, a potential user got stuck with the question about what ‘w’ means in w3DNA. Based on user feedback, I have added hints to some questions to make their answers more obvious. Whatever the reasons, each reported issue has been promptly resolved.

With the release of DSSR and the continuous support of an enthusiastic user community, I have every reason to believe that 3DNA will gain more popularity in the years to come.

Comment

3DNA JoVE paper published

A new paper titled Analyzing and Building Nucleic Acid Structures with 3DNA has been published in JoVE (Journal of Visualized Experiments). Specifically, the article illustrates 3DNA’s unique capability to characterize and modify DNA structures at the level of the constituent base-pair steps, and highlights a new feature in v2.1 to analyze and align an ensemble of related structures determined with NMR or generated by MD simulations.

Here is the abstract:

The 3DNA software package is a popular and versatile bioinformatics tool with capabilities to analyze, construct, and visualize three-dimensional nucleic acid structures. This article presents detailed protocols for a subset of new and popular features available in 3DNA, applicable to both individual structures and ensembles of related structures. Protocol 1 lists the set of instructions needed to download and install the software. This is followed, in Protocol 2, by the analysis of a nucleic acid structure, including the assignment of base pairs and the determination of rigid-body parameters that describe the structure and, in Protocol 3, by a description of the reconstruction of an atomic model of a structure from its rigid-body parameters. The most recent version of 3DNA, version 2.1, has new features for the analysis and manipulation of ensembles of structures, such as those deduced from nuclear magnetic resonance (NMR) measurements and molecular dynamic (MD) simulations; these features are presented in Protocols 4 and 5. In addition to the 3DNA stand-alone software package, the w3DNA web server, located at http://w3dna.rutgers.edu, provides a user-friendly interface to selected features of the software. Protocol 6 demonstrates a novel feature of the site for building models of long DNA molecules decorated with bound proteins at user-specified locations.

A new section dedicated to the JoVE paper will be set up on the 3DNA Forum soon. It will contain all the data files and scripts so our published results can be strictly reproduced. The section should also serve as a platform for open discussions of related protocols.

Comment

Number of base pairs with at least two inter-base H-bonds: 28 or 29?

Early on when I started on DNA structures, I read Saenger’s book Principles of Nucleic Acid Structure and became familiar with his classification of the 28 possible base-pairs (bps) for A, G, U(T), and C involving at least two (cyclic) hydrogen bonds (see figure below).

Later on, I read from the 2nd edition of The RNA World book a list of 29 bps compiled by Burkard, Turner & Tinoco. While the one bp discrepancy (28 vs 29) has been in my mind for quite a long while, I had never paid much attention to the issue until recently while adding classifications of RNA bps (among many other functionalities) to 3DNA. A Google search did not help solve the puzzle, so I decided to dig it out by comparing the two lists.

The Burkard et al. list is titled Structures of Base Pairs Involving at Least Two Hydrogen Bonds and it mentions specifically Saenger’s list:

The structures of 29 possible base pairs that involve at least two hydrogen bonds are given in Figures 1–5 (for further descriptions, see Saenger, in Principles of nucleic acid structure, p. 120. Springer-Verlag [1984]).

However, in the five figures, Burkard et al. do not provide the corresponding Saenger numbers (I to XXVIII, 1—28) for the 28 common bps; thus it is not immediately obvious which one (i.e., the new addition by Burkard et al.) is missing from Saenger’s list. Under careful scrutiny, the absent bp turns out to be the “G•C N3-amino, amino-N3” pair in Figure 3: “Six possible flipped purine-pyrimidine mismatches.” One example of such G+C pair is found in the 5S ribosomal RNA (chain 9, G3022—C3026) of Haloarcula marismortui in PDB entry 1vq8.

The above figure shows clearly that the G+C bp does indeed have two canonical H-bonds between base atoms, and it is difficult to speculate how it escaped Saenger’s selection criteria. In the upcoming new 3DNA component, I am listing this bp as number XXIX (29), along with the other 28 base pairs.

Comment

The Calcutta U-U base pair

Recently, I came across the so-called Calcutta U-U base pair (bp) [see figure below] while reading articles on C-H…O contacts in nucleic acid structures. Not familiar with this named pair before, I was curious to find out what it’s about. After some searching, I traced the origin of the Calcutta U-U bp to the following two papers published by Sundaralingam’s group during the middle 1990s:

Wahl et al. (1996). The structure of r(UUCGCG) has a 5’-UU-overhang exhibiting Hoogsteen-like trans UU base pairs. Nat. Struct. Biol., 3(1), 24-31. In Note added in proof, the author wrote:

We have called the novel U•U base pair, where the Hoogsteen face of one of the pyrimidines is involved in a C5-H—O4 hydrogen bond, the ‘Calcutta Base Pair’, since it was announced at the International Seminar-cum-School on Macromolecular Crystallographic Data held in Calcutta, November 16-20, 1995.

Wahl & Sundaralingam (1997). C-H…O hydrogen bonding in biology. Trends Biochem Sci., 22(3), 97-102. In this review article, the authors noted:

We recently discovered a novel U•U base pair, referred to as the Calcutta base pair, in the crystal structure of an RNA hexamer UUCGCG (Ref. 18). The two uracil bases form a conventional N(3)-H…O(4) and an unconventional C(5)-H…O(2) hydrogen bond (Fig. 3a). The C-H…O interaction is entirely ‘voluntary’ and not ‘forced’, underlining its importance in base mispairing.

3DNA has no problem to identify the Calcutta U-U bps (or any pair for that matter); an example is shown below based on the RNA hexamer UUCGCG structure (PDB entry: 1osu) solved by Sundaralingam and colleagues.

In the new 3DNA component I’ve been working on (and to be released soon), the Calcutta U-U pair is characterized as below:

1/A.U1 3/A.U2 [U-U] Calcutta 00-n/a tHW -MW
  anti C3'-endo 8.9 --- anti C3'-endo 30.3
  dcc=11.18  dnn=8.48  dmm=7.58  tor=-174.1
  H-bonds[2]: "O4(carbonyl)-N3(imino)[2.76]; C5-O4(carbonyl)[3.27]"

  Shear=-3.67   Stretch=-0.52     Stagger=-0.89
  Buckle=-1.41  Propeller=-16.03  Opening=-90.67

The Calcutta pair is explicitly named, along with other named base pairs (e.g., Watson-Crick [WC], Wobble, and Hoogsteen bps). It is classified as type tHW (trans with Hoogsteen/WC interacting edges), following the commonly used Leontis-Westhof nomenclature. It does not belong to any of the 28 bps (00-n/a) with at least two conventional H-bonds, as categorized by Saenger. In 3DNA, the Calcutta U-U pair is of M-N type, designated as -MW.

Among the well-known named base pairs, some are after the scientists who discovered them (e.g., WC and Hoogsteen bps), while others are based on chemical/geometrical features (e.g., Wobble and Sheared G-A bps), or a combination of both (e.g., reversed WC/Hoogsteen bps). The Calcutta U-U pair is unique in that it is named after a place in India:

Kolkata, or Calcutta, is the capital of the Indian state of West Bengal. … While the city’s name has always been pronounced Kolkata or Kolikata in Bengali, the anglicized form Calcutta was the official name until 2001, when it was changed to Kolkata in order to match Bengali pronunciation.

Comment

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

5-methyl is named C7 in DT8 of the DNA entry 355d

5-methyl is named CM5 in 5MC40 of the RNA entry 1ehz

5-methyl is named C5M in 5MU54 of the RNA entry 1ehz

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)