[Job] Staff Associate II (Computational Structural Biology) at Columbia University

The X3DNA-DSSR resource is at the forefront of structural bioinformatics, developing advanced tools for analyzing and modeling nucleic acid structures. We are seeking a highly motivated Staff Associate II to join our team and contribute to our next-generation analysis and visualization engine.

To see our resource in action, please visit wDSSR, our new web interface for dissecting and modeling 3D nucleic acid structures: https://web.x3dna-dssr.org/.

We are looking for a candidate with a strong scientific background in structural biology or bioinformatics and a desire to contribute to peer-reviewed publications through community-driven data analysis. We value individuals who are eager to learn, adapt to new technical challenges, and support the global research community.

For the full job description and to submit your application, please visit the official Columbia University posting: https://apply.interfolio.com/183705

---

Announcing wDSSR: The Next-Generation Web Interface to X3DNA-DSSR

Dear 3DNA/DSSR Community,

We are thrilled to announce the official launch of wDSSR (https://web.x3dna-dssr.org/), the powerful new web interface to the X3DNA-DSSR analytical engine.

Developed by Drs. Shuxiang Li and Xiang-Jun Lu and supported by NIH grant R24GM153869, wDSSR represents a major leap forward from our highly popular 2019 Web 3DNA 2.0 framework. While Web 3DNA 2.0 has faithfully served the community for the analysis, visualization, and modeling of 3D nucleic acid structures, wDSSR was built from the ground up to take full advantage of modern web technologies and the latest DSSR backend capabilities.

A Modern, Streamlined Scientific Workflow We have completely overhauled the user interface to provide a clean, intuitive, and task-driven experience. The core modeling and analysis tools are now seamlessly organized into a logical, single-word scientific workflow: Analyze, Rebuild, Model, Circularize, Mutate, Assemble, and Visualize.

Spotlight Feature: The "Assemble" Module One of the most exciting upgrades is the newly renamed Assemble tab (formerly "Composite"). This advanced composite model builder allows you to effortlessly construct complex, higher-order models by linking any combination of nucleic acid duplexes or protein-DNA/RNA complexes. You can quickly connect up to six distinct target structures, ranging from simple linked A-DNA and B-DNA duplexes to large, protein-decorated structural assemblies.

Immediate Global Adoption Although wDSSR has just launched, we are incredibly humbled to share that it is already seeing rapid worldwide adoption! According to recent network infrastructure data, the new interface is actively being used by researchers across North America, South America, Europe, and Asia. Within just a few days, we have recorded active sessions from prestigious institutions around the globe, including:

  • The Weizmann Institute of Science in Israel
  • Katholieke Universiteit Leuven in Belgium
  • Queen's University in Canada
  • Universidad Nacional Autonoma de Mexico (UNAM) in Mexico
  • Emory University and the Wadsworth Centers Laboratories and Research in the United States
  • Jawaharlal Nehru University and the China Education and Research Network in Asia

How to Cite While a dedicated paper for wDSSR is currently in preparation, researchers should cite the server using its URL (https://web.x3dna-dssr.org/) alongside the 2019 Web 3DNA 2.0 paper and the foundational 2015 DSSR paper. Full details and funding acknowledgements can be found on our newly consolidated About page.

We invite you all to try out the new wDSSR platform! As always, your feedback is invaluable to us, and we encourage you to share your thoughts, questions, and structural models via the newly updated Questions & Feedback link in the wDSSR footer.

Happy modeling!

---

3DNA/blocview-PyMOL images in covers of the RNA journal

I recently performed a quick survey of the cover images of the RNA journal in 2019. I was pleased to find that 9 out of the 12 cover images were provided by the Nucleic Acid Database where 3DNA/blockview and PyMOL were employed, as noted below:

The RNA backbone is displayed as a red ribbon; bases are shown as blocks with NDB coloring: A—red, C—yellow, G—green, U—cyan; geneticin ligands are shown in spacefill with element colors: C—white, N—blue, O—red. The image was generated using 3DNA/blocview and PyMol software.

Details of the 9 cover images are listed below:

  1. January 2019 Rhodobacter sphaeroides Argonaute with guide RNA/target DNA duplex containing noncanonical A-G pair (PDB code: 6d9k)
  2. April 2019 Group I self-splicing intron P4-P6 domain mutant U131A (PDB code: 6d8l)
  3. May 2019 Crystal structure of T. thermophilus 50S ribosomal protein L1 in complex with helices H76, H77, and H78 of 23S RNA (PDB code: 5npm)
  4. June 2019 Crystal structure of ykoY-mntP riboswitch chimera bound to cadmium (PDB code: 6cc3)
  5. July 2019 G96A mutant of the PRPP riboswitch from T. mathranii bound to ppGpp (PDB code: 6ck4)
  6. August 2019 Crystal structure of the metY SAM V riboswitch (PDB code: 6fz0)
  7. October 2019 Crystal structure of protease factor Xa bound to RNA aptamer 11F7t and rivaroxaban (PDB code: 5vof)
  8. November 2019 Drosophila melanogaster nucleosome remodeling complex (PDB code: 6f4g)
  9. December 2019 Crystal structure of the Homo Sapiens cytoplasmic ribosomal decoding site in complex with Geneticin (PDB code: 5xz1)

Here is the composite figure of the 9 cover images.

3DNA/blockview-PyMOL cartoon-block schematics in the covers of the RNA journal in 2019

See also:

Comment

---

Web API to 3DNA

I’ve created a web API to DSSR and SNAP, and fiber models. The overall help message is available via http://api.x3dna.org. Individually, each program is accessed as below.

Help message on x3dna-dssr (DSSR): http://api.x3dna.org/dssr/help

Usage with 'http' (HTTPie):
    http -f http://api.x3dna.org/dssr [options] url=|model@
    http http://api.x3dna.org/dssr/help   -- display this help message

Options:
    json=true-or-FALSE(default)    [e.g., json=true # JSON output]
    pair=true-or-FALSE(default)    [e.g., pair=1    # base-pair only]
    hbond=true-or-FALSE(default)   [e.g., hbond=t   # H-bonding info]
    more=true-or-FALSE(default)    [e.g., more=y    # further details]

Required parameter:
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1ehz.pdb.gz]
    model@coordinate-file      [e.g., model@1ehz.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.

Examples:
    http -f http://api.x3dna.org/dssr url=https://files.rcsb.org/download/1ehz.pdb.gz
    http -f http://api.x3dna.org/dssr model@1ehz.cif pair=1
    # with 'curl'
    curl http://api.x3dna.org/dssr -F 'url=https://files.rcsb.org/download/1ehz.pdb.gz'
    curl http://api.x3dna.org/dssr -F 'model=@1msy.pdb' -F 'pair=1'

Note:
    The web API has an upper limit on coordinate file size (gzipped): < 6 MB

Help message on x3dna-snap (SNAP): http://api.x3dna.org/snap/help

Usage with 'http' (HTTPie):
    http -f http://api.x3dna.org/snap [options] url=|model@
    http http://api.x3dna.org/snap/help   -- display this help message

Options:
    json=true-or-FALSE(default)    [e.g., json=true # JSON output]
    hbond=true-or-FALSE(default)   [e.g., hbond=t   # H-bonding info]

Required parameter:
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1oct.pdb.gz]
    model@coordinate-file      [e.g., model@1oct.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.

Examples:
    http -f http://api.x3dna.org/snap url=https://files.rcsb.org/download/1oct.pdb.gz
    http -f http://api.x3dna.org/snap model@1oct.cif json=1
    # with 'curl'
    curl http://api.x3dna.org/snap -F 'url=https://files.rcsb.org/download/1oct.pdb.gz'
    curl http://api.x3dna.org/snap -F 'model=@1oct.cif' -F 'json=1'

Note:
    The web API has an upper limit on coordinate file size (gzipped): < 6 MB

Help message on 56 fiber models: http://api.x3dna.org/fiber/help

Usage with 'http' (HTTPie):
    http http://api.x3dna.org/fiber/help    # display this help message
    http http://api.x3dna.org/fiber/list    # show a list of available fiber models (56 in total)
    http http://api.x3dna.org/fiber/str_id  # build model 'str_id' in the range of [1, 56]
    http http://api.x3dna.org/fiber/name    # generate a model with common names as shown below:
              A-DNA, B-dna, C_DNA, D-DNA, ZDNA, RNA, RNAduplex, PaulingTriplex, G4
              Case does not matter, and the separator can be '-' or '_' or omitted.
              So a-dna, A-dNA, a_DNA, or ADNA is valid for building an A-DNA model.

Options (via query strings, or form fields):
    seq=base-sequence # A, C, G, T, U for generic model
    repeat=number     # number of repeats of the sequence
    cif=1             # output file in mmCIF format

Examples with 'http' (HTTPie):
    http http://api.x3dna.org/fiber/1       # model no. 1 (i.e., calf thymus A-DNA model)
    http -f http://api.x3dna.org/fiber/1 seq=A3TTT repeat=2  # specific sequence, repeated twice
    http http://api.x3dna.org/fiber/rna     # single-stranded RNA model
    http http://api.x3dna.org/fiber/rna-ds  # double-stranded RNA model
    http http://api.x3dna.org/fiber/pauling # the triplex model of Pauling & Corey
    http http://api.x3dna.org/fiber/g4      # G-quadruplex model
    # with 'curl'
    curl http://api.x3dna.org/fiber/1
    curl http://api.x3dna.org/fiber/1 -d 'seq=A3TTT' -d 'repeat=2'
    curl http://api.x3dna.org/fiber/rna
    curl http://api.x3dna.org/fiber/rna-ds
    curl http://api.x3dna.org/fiber/pauling
    curl http://api.x3dna.org/fiber/g4

Note:
    The web API has two upper limits: repeats < 1,000, and nucleotides < 10,000.

Comment

---

DSSR-enhanced visualization of nucleic acid structures in PyMOL

The skmatic.x3dna.org website (see screenshot below) aims to showcase DSSR-enabled cartoon-block schematics of nucleic acid structures using PyMOL. It presents a simple interface to browse pre-calculated PDB entries with a set of default settings: long rectangular blocks for Watson-Crick base-pairs, square blocks for G-tetrads in G-quadruplexes, with minor-groove edges in black. Users can also specify an URL to a PDB- or mmCIF-formatted file or upload such an atomic coordinates file directly, and set several common options to customerize to the rendered image.

Moreover, a web API to DSSR-PyMOL schematics is available to allow for its easy integration into third-party tools.

Screenshot of the homepage of DSSR/PyMOL schematics

Input a PDB id

Pre-calculated cartoon-block images together with summary information are available for PDB entries of nucleic-acid-containing structures. Note that gigantic structures like ribosomes that are only represented in mmCIF format are excluded from the resource. The base block images are most effective for small to medium-sized structures.

Here are a few examples:

  • 1ehz, the crystal structure of yeast phenylalanine tRNA at 1.93-Å resolution
  • 2lx1, the major conformation of the internal loop 5’GAGU/3’UGAG
  • 2grb”, the crystal structure of an RNA quadruplex containing inosine-tetrad
  • 4da3, the crystal structure of an intramolecular human telomeric DNA G-quadruplex 21-mer bound by the naphthalene diimide compound MM41
  • 1oct, crystal structure of the Oct-1 POU domain bound to an octamer site
  • 2hoj, the crystal structure of an E. coli thi-box riboswitch bound to thiamine pyrophosphate, manganese ions

Each entry is shown with images in six orthogonal perspectives: front, back, right, left, top, bottom. The ‘front’ image (upper-left in the panel) is oriented into the most-extended view with the DSSR --blocview option. The corresponding PyMOL session file and PDB coordinate file are available for download. One can also visualize the structure interactively via 3Dmol.js.

Sample PDB entries

Users can browse random samples of pre-calculated PDB entries. The number should be between 3 and 99, with a default of 12 entries (see below for an example). Simply click the ‘Submit’ button or the “Random samples (3 to 99)”: http://skmatic.x3dna.org/pdb_entry link to see results of randomly picked 12 PDB entries each time.

Specify a coordinate file

The atomic coordinate file must be in PDB or mmCIF format, and can be optionally gzipped (.gz). One can either specify an URL to or select a coordinate file. Several common options are available to allow for user customizations.

Web API help message

Usage with 'http' (HTTPie):
    http -f http://skmatic.x3dna.org/api [options] url=|model@
    http http://skmatic.x3dna.org/api/pdb/pdb_id  -- for a pre-calculated PDB entry
    http http://skmatic.x3dna.org/api/help        -- display this help message
Options:
    block_file=styles-in-free-text-format [e.g., block_file=wc-minor]
    block_color=nt-selection-and-color    [e.g., block_color='A:pink']
    block_depth=thickness-of-base-block   [e.g., block_depth=1.2]
    r3d_file=true-or-FALSE(default)       [e.g., r3d_file=true]
    raw_xyz=true-or-FALSE(default)        [e.g., raw_xyz=true]
Required parameter
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1ehz.pdb.gz]
    model@coordinate-file      [e.g., model@1ehz.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.
Examples
    http -f http://skmatic.x3dna.org/api block_file='wc-minor' model@1ehz.cif r3d_file=t
    http -f http://skmatic.x3dna.org/api url=https://files.rcsb.org/download/1ehz.pdb.gz -d -o 1ehz.png
    http http://skmatic.x3dna.org/api/pdb/1ehz -d -o 1ehz.png
    # with 'curl'
    curl http://skmatic.x3dna.org/api -F 'model=@1msy.pdb' -F 'block_file=wc-minor' -F 'r3d_file=1'
    curl http://skmatic.x3dna.org/api -F 'url=https://files.rcsb.org/download/1ehz.pdb.gz' -o 1ehz.png
    curl http://skmatic.x3dna.org/api/pdb/1ehz -o 1ehz.png

Sample images











Comment

---

SNAP and DSSR in DNAproDB

While reading DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes, I noticed SNAP and DSSR being mentioned. The detailed citations are as below:

Information about individual nucleotide–residue interactions is also provided, such as hydrogen bonding, interaction geometry (based on SNAP (10)), buried solvent accessible surface areas and identification of the interacting residue and nucleotide moieties …

DNAproDB assigns a geometry for every nucleotide–residue interaction identified using SNAP, a component of the 3DNA program suite (10). The residues for which probabilities are shown are those with planar side chains so that a stacking conformation can be defined.

Base pairing and base stacking between nucleotides are identified using the program DSSR (20).

SNAP and DSSR are two (relatively) new programs in the 3DNA software suite. As the author, I am always glad to see them being cited explicitly in literature. The fact that SNAP and DSSR are cited together by DNAproDB, however, is especially significant. I am aware of the initial version of DNAproDB, but I definitely like the updated one better. This is what I recently wrote in response to a question on the 3DNA Forum:

Regarding DNA-protein interactions in general, you may want to have a look of DNAproDB from the Remo Rohs laboratory. A new paper has just been published in NAR, ‘DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes’.

I’ve no doubt that SNAP and DSSR would be widely used in applications related to DNA/RNA structural bioinformatics. DSSR (to a lesser extent, SNAP) represents my view of what a scientific software tool should be.

Comment

---

DSSR is used in RNAMake and 3dRNA 2.0

Recently I noticed two new citations to DSSR, an integrated software tool for dissecting the spatial structure of RNA. One is from the Yesselman et al. article Computational design of three-dimensional RNA structure and function in Nature Nanotechnology, and the other is from the Wang et al. article 3dRNA v2.0: An Updated Web Server for RNA 3D Structure Prediction in International Journal of Molecular Sciences.

Yesselman et al. has used DSSR in RNAMake for building the motif library. The relevant section is as follows:

We processed each RNA structure to extract every motif with Dissecting the Spatial Structure of RNA (DSSR)54 with the following command:

x3dna-dssr –i file.pdb –o file_dssr.out

We manually checked each extracted motif to confirm that it was the correct type, as DSSR sometimes classifies tertiary contacts as higher-order junctions and vice versa. For each motif collected from DSSR, we ran the X3DNA find_pair and analyze programs to determine the reference frame for the first and last base pair of each motif to allow for the alignment between motifs:

find_pair file.pdb 2> /dev/null stdout | analyze stdin >& /dev/null

It is worth noting the sentence that “DSSR sometimes classifies tertiary contacts as higher-order junctions and vice versa.” Presumably. the authors are referring to the inclusion of ‘isolated canonical pairs’ in junctions by default in DSSR. Overall, the default DSSR settings follow the most common practice in RNA literature. In the meantime, I am aware that the community may not agree on every detail. Thus DSSR provide many options (documented or otherwise) to cater for other potential use cases. See the Stems of junction structure have only one base pair and Junction definition threads on the 3DNA Forum for two examples. In the long run, DSSR is likely to help consolidate RNA nomenclature that can be applied in a pragmatic, consistent manner.

Note also that DSSR provides the reference frame of each identified base pair via the JSON option. Using 1ehz as an example, the following command provides detailed information about base pairs:

x3dna-dssr -i=1ehz.pdb --json --more | jq .pairs

In the 3dRNA 2.0 paper, DSSR is cited as below. This is the first time DSSR is integrated in the 3dRNA pipeline.

The predicted structures are built from the sequence and secondary structure, while the former is obtained from their native structures fetched from PDB (https://www.rcsb.org/), and the latter is calculated from DSSR (Dissecting the Spatial Structure of RNA) [39].

Comment

---

R wrapper to DSSR in VeriNA3d

I recently came across a Bioinformatics article VeriNA3d: an R package for nucleic acids data mining by Gallego et al. from IRB Barcelona. VeriNA3d can perform dataset analysis, single-structure analysis, and exploratory data analyses, with an emphasis on complex RNA structures. I am glad to see the DSSR is one of the third-party utilities that have been integrated into VeriNA3d, as shown below

VeriNA3d offers integration with third-party utilities such as the non-redundant lists of RNA structures (Leontis and Zirbel, 2012), the eRMSD suggested to compare RNA structures (Bottaro et al., 2014), a wrapper to the DSSR (Dissecting the Spatial Structure of RNA) software (Lu et al., 2015) and query functions to access the PDBe REST API (Velankar et al., 2016).

I browsed the GitLab repository and read through the supplemental documents. Clearly, VeriNA3d is a handy tool for the R community to perform RNA 3D structural analyses.

To DSSR users, Section “9 The dssr wrapper: getting the base pairs” of the supplemental PDF “VeriNA3d: introduction and use cases” is particularly relevant. The three paragraphs (with minor edits) are excerpted below:

The DSSR software (Dissecting the Spatial Structure of RNA) (Lu, Bussemaker, and Olson 2015) represents an invaluable resource to handle RNA structures. Some of the functions of veriNA3d overlap with the functionalities of DSSR, and both applications provide unique different features. We implement a wrapper to execute DSSR directly from R and get the best of both worlds in one place.

Note that installing veriNA3d does not automatically install DSSR, since we don’t redistribute third-party software. Before any user can use our wrapper, the dssr function, DSSR should be installed separately. To address this installation we redirect you to the DSSR manual, where anyone can find the specific instructions for their system. Once DSSR is installed and working in your computer, you will also be able to use it with our wrapper. If the DSSR executable (named x3dna-dssr) is in your path, dssr will find it automatically. If the wrapper does not find it, you can still use it specifying the absolute path to the executable with the argument exefile. Find more information running ?dssr.

One of the DSSR capabilities that users might be interested in is the detection and classification of base pairs. The following code shows a simple example. The output of the dssr wrapper is an object got from the json DSSR output. From R, json objects are parsed in the form of a tree of lists, with different types of information. Most of the interesting data is under the list models, sublist parameters, as shown herein.

I echo the authors’ policy of not redistributing third-party software with VeriNA3d. DSSR is under active development. Users should always visit the 3DNA Forum for downloading the latest version of DSSR, reporting bugs, and asking questions.

The R interface to DSSR (via JSON output) in VeriNA3d represents one of the intended use cases of DSSR’s many possible applications. No doubt DSSR is being increasingly integrated into other resources of RNA structural bioinformatics. Hopefully, more advanced DSSR features (than the detection and classification of base pairs) will also be widely appreciated in the future. Users would love DSSR better when they gain more experience in structural bioinformatics.

Comment

---

Web 3DNA 2.0 is highlighted in the NAR'19 web server cover

It is a great pleasure to see that our article Web 3DNA 2.0 for the analysis, visualization, and modeling of 3D nucleic acid structures has been highlighted in the cover page of the web server issue of NAR’19. According to the editor, This year, 331 proposals were submitted and 122, or 37%, were approved for manuscript submission. Of those approved, 94, or 77%, were ultimately accepted for publication. Overall, that corresponds to a ~28% acceptance rate.

The cover image and its caption are shown below. Moreover, details on how the cover image was created are available on the 3DNA Forum.

Web 3DNA 2.0 highlighted in the cover of the NAR'19 webserver issue

Caption: Examples of customized molecular models that can be generated with 3DNA: (top) a chromatin-like, nucleosome-decorated DNA with the structures of known histone-DNA assemblies placed at user-defined binding sites; (lower left) molecular schematic of a DNA trinucleotide diphosphate illustrating the base planes and reference frames used to construct and analyze 3D nucleic acid-containing structures; (lower right) customized single-stranded tRNA model built from a user-defined base sequence and a set of rigid-body parameters describing the desired placement of successive bases. Color code of base blocks: A, red; C, yellow; G, green; T, blue; U, cyan.

Comment

---

3DNA blocview image in the cover of the RNA journal

While browsing the June 2019 issue of the RNA journal, I was surprised to see a cover image with familiar schematic representations:

Crystal structure of ykoY-mntP riboswitch chimera bound to cadmium

The caption is as below:

Crystal structure of ykoY-mntP riboswitch chimera bound to cadmium (Protein Data Bank code: 6cc3; Bachas ST, Ferré-D’Amaré AR. 2018. Convergent use of heptacoordination for cation selectivity by RNA and protein metalloregulators. Cell Chem Biol 25: 962–973.e5). The RNA backbone is displayed as a red ribbon; bases are shown as blocks with NDB coloring: A—red, C—yellow, G—green, U—cyan; cadmium ions are shown as red spheres. The image was generated using 3DNA/blocview and PyMol software. Cover image provided by the Nucleic Acid Database (ndbserver.rutgers.edu).

In addition to the blocview script distributed with 3DNA v2.x, the block-view has been integrated into DSSR via the --blocview option. Notably, the DSSR-plugin introduces the dssr_block command to PyMOL for interactive visualization of nucleic acid structures. See the DSSR User Manual for more information.

Comment

---

Identification of nucleotides

Nucleic acids structural bioinformatics starts with the identification of nucleotides (nts) from atomic coordinates. As biopolymers, RNA and DNA have standard IUPAC names of atoms for the five bases (see the Figure below), sugars (ending with prime, e.g., C1’, O2’), and the phosphate (P, OP1, and OP2). The atomic coordinates (in PDB or mmCIF format) from the Protein Data Bank (PDB) follow the convention.

Standard bases, with names

Trained as a chemist, I am aware that the bases are aromatic, heterocyclic compounds (purines and pyrimidines). Moreover, the five standard bases (A, C, G, T, and U) also share a six-membered ring, with atoms named consecutively (N1, C2, N3, C4, C5, C6). This special feature can be employed to identify nts automatically, from PDB atomic coordinates. The ring skeleton is not influenced by protonation states, tautomeric forms, or modifications in base, sugar or phosphate. Early versions of 3DNA (up to v2.0) used only N1, C2, and C6 atoms to identify an nt: an additional N9 as purine, otherwise as pyrimidine. In 3DNA v2.3 and DSSR, the procedure has been refined to take advantage of all available rings atoms. It is thus more robust against distortions and still works even when any of the N1, C2, C6, or N9 atoms are mutated or missing. This blog post provides further technical details on how the method works.

The template used to identify nts is a purine, with nine base ring atoms. Purine is chosen since it contains atoms of the six-membered ring and N7, C8, and N9. Its atomic coordinates in PDB format are shown below. The coordinates are taken from ‘G’ in the standard reference frame ($X3DNA/config/Atomic_G.pdb). Using ‘A’ as reference won’t make any difference since the RMSD between them is only 0.038 Å.

ATOM      1  N9    G A   1      -1.289   4.551   0.000  1.00  0.00           N
ATOM      2  C8    G A   1       0.023   4.962   0.000  1.00  0.00           C
ATOM      3  N7    G A   1       0.870   3.969   0.000  1.00  0.00           N
ATOM      4  C5    G A   1       0.071   2.833   0.000  1.00  0.00           C
ATOM      5  C6    G A   1       0.424   1.460   0.000  1.00  0.00           C
ATOM      6  N1    G A   1      -0.700   0.641   0.000  1.00  0.00           N
ATOM      7  C2    G A   1      -1.999   1.087   0.000  1.00  0.00           C
ATOM      8  N3    G A   1      -2.342   2.364   0.001  1.00  0.00           N
ATOM      9  C4    G A   1      -1.265   3.177   0.000  1.00  0.00           C

The nt-identification process begins with a mapping of at least three atoms in the purine, followed by a least-squares fit between corresponding atoms. For the five standard bases and most modified ones, the RMSD is normally less than 0.12 Å, as seen in the Figure below. Even the unsaturated dihydrouridine in tRNA has an RMSD of less than 0.25 Å: for the yeast phenylalanine tRNA (PDB id: e1ehz), for example, it is 0.205 Å for H2U-16, and 0.226 Å for H2U-17. DSSR uses a cutoff of 0.28 Å, covering essentially all nucleotides in the PDB. As an extreme case, the DA1 residue on chain T of PDB id 4ki4 has only three base atoms: N7, C8, and N9 (i.e., no atoms from the six-membered ring). With an RMSD of only 0.005 Å, DSSR still takes it as an nt, properly assigned as ‘A’.

Molecular dynamics (MD) simulations sometimes produce heavily distorted bases, which is over the default cutoff. Users may change the cutoff to a larger value to accommodate such unusual cases.

Nucleotide identification in 3DNA-DSSR

In addition to dihydrouridine, the above Figure also shows pseudouridine (PSU), 1-methyladenosine (1MA), 4-thiouridine (4SU), and the heavily modified YYG in tRNA. They are all easily identified using the same scheme. Since the nt-identification method concentrates on base rings, modifications in sugar or the phosphate group do not pose any problem. For example, in tRNA 1ehz, DSSR also identifies O2’-methylguanosine (OMG) and O2’-methylcytidine (OMC) as modified nts.

Two special cases worth mentioning. The ligand IMD in PDB id 1r8e has a five-membered ring. Its atoms are named similarly to those of an nt, and the fitted RMSD is only 0.29 Å. IMD can be filtered out by its missing of the C6 atom and having an N1—C5 covalent bond. The ligand SPM in PDB id 355d is a linear molecule, and its RMSD (1.86 Å) is clearly far off to be taken as an nt.

Another particular case (of a different kind) is the abasic sites, especially in X-ray crystal structures in the PDB. By definition, abasic sites do not have base atoms available. Thus the described method is not applicable to their characterization as nts. As of v1.7.3-2017dec26, however, DSSR has also incorporated abasic sites into the analysis pipeline, by default. The program checks backbone linkage and residue name for appropriate nt assignment. The abasic sites could constitute part of (internal) loops which would otherwise be broken into segments by DSSR.

Overall, I feel confident to say that 3DNA-DSSR has practically solved the problem of identifying nts from atomic coordinates. The method detailed herein (and outlined in the DSSR paper) is simple and easy to understand/implement. Moreover, it has been extensively tested in real-world applications for well over a decade. I’ve yet to find a single case where it does not work as expected.

Comment

---

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu