[Job] Staff Associate II (Computational Structural Biology) at Columbia University

The X3DNA-DSSR resource is at the forefront of structural bioinformatics, developing advanced tools for analyzing and modeling nucleic acid structures. We are seeking a highly motivated Staff Associate II to join our team and contribute to our next-generation analysis and visualization engine.

To see our resource in action, please visit wDSSR, our new web interface for dissecting and modeling 3D nucleic acid structures: https://web.x3dna-dssr.org/.

We are looking for a candidate with a strong scientific background in structural biology or bioinformatics and a desire to contribute to peer-reviewed publications through community-driven data analysis. We value individuals who are eager to learn, adapt to new technical challenges, and support the global research community.

For the full job description and to submit your application, please visit the official Columbia University posting: https://apply.interfolio.com/183705

---

Announcing wDSSR: The Next-Generation Web Interface to X3DNA-DSSR

Dear 3DNA/DSSR Community,

We are thrilled to announce the official launch of wDSSR (https://web.x3dna-dssr.org/), the powerful new web interface to the X3DNA-DSSR analytical engine.

Developed by Drs. Shuxiang Li and Xiang-Jun Lu and supported by NIH grant R24GM153869, wDSSR represents a major leap forward from our highly popular 2019 Web 3DNA 2.0 framework. While Web 3DNA 2.0 has faithfully served the community for the analysis, visualization, and modeling of 3D nucleic acid structures, wDSSR was built from the ground up to take full advantage of modern web technologies and the latest DSSR backend capabilities.

A Modern, Streamlined Scientific Workflow We have completely overhauled the user interface to provide a clean, intuitive, and task-driven experience. The core modeling and analysis tools are now seamlessly organized into a logical, single-word scientific workflow: Analyze, Rebuild, Model, Circularize, Mutate, Assemble, and Visualize.

Spotlight Feature: The "Assemble" Module One of the most exciting upgrades is the newly renamed Assemble tab (formerly "Composite"). This advanced composite model builder allows you to effortlessly construct complex, higher-order models by linking any combination of nucleic acid duplexes or protein-DNA/RNA complexes. You can quickly connect up to six distinct target structures, ranging from simple linked A-DNA and B-DNA duplexes to large, protein-decorated structural assemblies.

Immediate Global Adoption Although wDSSR has just launched, we are incredibly humbled to share that it is already seeing rapid worldwide adoption! According to recent network infrastructure data, the new interface is actively being used by researchers across North America, South America, Europe, and Asia. Within just a few days, we have recorded active sessions from prestigious institutions around the globe, including:

  • The Weizmann Institute of Science in Israel
  • Katholieke Universiteit Leuven in Belgium
  • Queen's University in Canada
  • Universidad Nacional Autonoma de Mexico (UNAM) in Mexico
  • Emory University and the Wadsworth Centers Laboratories and Research in the United States
  • Jawaharlal Nehru University and the China Education and Research Network in Asia

How to Cite While a dedicated paper for wDSSR is currently in preparation, researchers should cite the server using its URL (https://web.x3dna-dssr.org/) alongside the 2019 Web 3DNA 2.0 paper and the foundational 2015 DSSR paper. Full details and funding acknowledgements can be found on our newly consolidated About page.

We invite you all to try out the new wDSSR platform! As always, your feedback is invaluable to us, and we encourage you to share your thoughts, questions, and structural models via the newly updated Questions & Feedback link in the wDSSR footer.

Happy modeling!

---

DSSR-derived DBN for an input entry with multiple RNA molecules

Dot bracket notation (dbn) is a popular format to represent RNA secondary structures. Initially introduced by the ViennaRNA package, dbn uses dots (.) for unpaired bases, and matched parentheses () for the canonical Watson-Crick A-T and G-C or the wobble G-U pairs. This compact representation was designed for fully nested (i.e., pseudoknot free) RNA secondary structures in a single RNA molecule. Over the years, it has been extended to cover pseudoknots (of possibly higher orders) using matched pairs of [], {}, and <> etc.

To derive dbn from three-dimensional atomic coordinates with DSSR, I was faced with an issue on how to represent multiple RNA chains (molecules). A closely related yet practical problem is chain breaks, as in x-ray crystal structures where disordered regions may not have fitted coordinates. I searched but failed to find any ‘standard’ way to account for chain breaks or multiple molecules in dbn. The commonly used programs for visualizing RNA secondary structure diagrams that I tested at that time did not take such cases into consideration — they simply showed all bases as if they were from a single continous RNA chain.

I discussed the issue with Dr. Yann Ponty, the maintainer of the popular VARNA program. After a few around of email exchanges, we introduced an extra symbol (&) in both sequence and dbn to designate multiple chains or breaks within a chain to communicate between DSSR and VARNA.

As an example, the DSSR-derived dbn for the double-stranded DNA structure 355d (the famous Dickerson dodecamer) is as below:

Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>355d nts=24 [whole]
CGCGAATTCGCG&CGCGAATTCGCG
((((((((((((&))))))))))))
>355d-A #1 nts=12 [chain] DNA
CGCGAATTCGCG
((((((((((((
>355d-B #2 nts=12 [chain] DNA
CGCGAATTCGCG
))))))))))))

As another example, the PDB entry 2fk6 contains a tRNA with chain breaks — nucleotides 26 to 45 are missing from the structure (see figure below). The DSSR-derived dbn is as follows — note the * at the end of the header line.

>2fk6-R #1 nts=53 [chain] RNA*
GCUUCCAUAGCUCAGCAGGUAGAGC&GUCAGCGGUUCGAGCCCGCUUGGAAGCU
(((((((..((((.....[..))))&...(((((..]....)))))))))))).

2FK6: RNAse Z/tRNA(Thr) complex with chain break

It is worth mentioning a subtle point in DSSR-derived dbn with multiple chains, i.e., the order of the chains may make a difference! The point is best illustrated with a concrete example — here, 4un3, the crystal structure of Cas9 bound to PAM-containing DNA target. Based on the data file downloaded directly from the PDB (4un3.pdb), the relevant portions of DSSR output are:

****************************************************************************
Special notes:
   o Cross-paired segments in separate chains, be *careful* with .dbn

****************************************************************************
This structure contains *1-order pseudoknot
   o You may want to run DSSR again with the '--nested' option which removes
     pseudoknots to get a fully nested secondary structure representation.
   o The DSSR-derived dbn may be problematic (see notes above).

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>4un3 nts=120 [whole]
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG&CAATACCATTTTTTACAAATTGAGTTAT&AAATGGTATTG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).&[[[[[[[[))))))))))))))))))))&...]]]]]]]]
>4un3-A #1 nts=81 [chain] RNA
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).
>4un3-C #2 nts=28 [chain] DNA
CAATACCATTTTTTACAAATTGAGTTAT
[[[[[[[[))))))))))))))))))))
>4un3-D #3 nts=11 [chain] DNA
AAATGGTATTG
...]]]]]]]]

The notes in the DSSR output is worth paying attention to. Specifically, it reports a “*1-order pseudoknot” — note also the *! Here the target DNA chain C comes before DNA chain D in the PDB file. The 5′-end bases in chain C pair with bases in D, and the 3′-end bases in C pair with RNA bases in chain A. There exist pairs crossing along the ‘linear’ sequence position-wise, hence the reported “pseudoknot”. However, simply reverse DNA chains C and D, i.e., moving chain D before C (in file 4un3-ADC.pdb), the “pseudoknot” will be gone, as shown below:

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>4un3-ADC nts=120 [whole]
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG&AAATGGTATTG&CAATACCATTTTTTACAAATTGAGTTAT
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).&...((((((((&))))))))))))))))))))))))))))
>4un3-ADC-A #1 nts=81 [chain] RNA
AUAACUCAAUUUGUAAAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG
((((((((((((((((((((((((((..((((....))))....))))))..(((..).)).......((((....)))).
>4un3-ADC-D #2 nts=11 [chain] DNA
AAATGGTATTG
...((((((((
>4un3-ADC-C #3 nts=28 [chain] DNA
CAATACCATTTTTTACAAATTGAGTTAT
))))))))))))))))))))))))))))



Notes added on March 19, 2015

  • It has drawn to my attention that the NUPACK uses ‘+’ instead of ‘&’ as the symbol to separate multiple chains (or chain breaks). In fact, DSSR has an undocumented option --dbn_break which can be set to any of the character in the string &.:,|+. The ‘&’ symbol was chosen for communication with VARNA which requires ‘&’, at least up to now. This is an excellent example showing the efforts that I have put into the little details while developing DSSR.
  • The issue on proper ordering of multiple chains to avoid crossing lines (false pseudoknots) has been formally addressed by Dirks et al. in their 2007 article titled Thermodynamic analysis of interacting nucleic acid strands (SIAM Rev, 49, 65-88), specifically in Section 2.1 (Fig. 2.1). Applying that algorithm to nucleic acid structures, however, is beyond the scope of DSSR. The program strictly respects the ordering of chains and nucleotides within a given PDB or PDBx/mmCIF file, but outputs warning messages where necessary to draw users’ attention. As another example, I’ve recently noticed that DNA duplexes produced by Maestro (a product of Schrödinger) list nucleotides of the complementary strand in 3′ to 5′ order to match the 5′ to 3′ directionality of the leading strand for each Watson-Crick pair (See below).
****************************************************************************
Special notes:
   o nucleotides out of order

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>ga62_ca62_1m_in nts=24 [whole]
GGCGAATTCCGG&C&C&G&C&T&T&A&A&G&G&C&C
((((((((((((&)&)&)&)&)&)&)&)&)&)&)&)
>ga62_ca62_1m_in-1-A #1 nts=12 [chain] DNA
GGCGAATTCCGG
((((((((((((
>ga62_ca62_1m_in-1-B #2 nts=12 [chain] DNA
C&C&G&C&T&T&A&A&G&G&C&C
)&)&)&)&)&)&)&)&)&)&)&)

Comment

---

The Biophysical Society (BPS) 59th annual meeting at Baltimore

I’m going to attend the Biophysical Society (BPS) 59th Annual Meeting to be held during February 7-11 at Baltimore, Maryland. In last year’s BPS annual meeting (San Francisco, California), I was delighted to come across a few 3DNA users at poster sessions. I thought this post may help to connect me with some DSSR/3DNA users in the coming meeting.

Want to have a meetup at Baltimore? Please drop me a message!

Comment

---

Weird atom names of ligand thiamine pyrophosphate (TPP)

Recently I came across the ligand thiamine pyrophosphate (TPP) in some RNA riboswitch structures. I was a bit surprised by the atom names adopted for the ligand by the PDB. See figures below for the chemical structure of TPP from the RCSB PDB website (first), and the three-dimensional structure of the ligand from the riboswitch 2gdi (second).

Chemical structure of ligand thiamine pyrophosphate

Ligand thiamine pyrophosphate in PDB entry 2gdi

Specifically, the planar base-like moiety at the right has atom names ending with prime. To my knowledge, only sugar atom names of DNA and RNA nucleotides have the prime suffix, such as the 2′-hydroxyl group in RNA.

The RCSB webpage for TPP shows that currently there are 107 entries in the PDB, among which 100 are from proteins, 6 from RNA, and one in a RNA-protein complex. It is not clear to me whether the prime-bearing names in TPP are following any documented ‘standard’ or convention. DSSR is nevertheless taking a note of such ‘weird’ cases.

Comment

---

The 3DNA Forum registered users have reached 2000

As of today, the number of registered users on the 3DNA Forum has reached 2000. Over the past three years, the annual average of resignations is 650, corresponding to approximately 1.8 per day. While many registrations use free email services (gmail, hotmail or yahoo, etc), a significant portion (especially more recent ones) employs their job email (e.g., .edu). This is clear sign of increasing trust the community puts in the Forum.

To ensure the 3DNA Forum spam-free, I’ve adhered a zero-tolerance policy of any trolling or suspicion activities. The anti-spam software has played a big role in making this clean status feasible, as is evident from the note: “120,933 Spammers blocked up until today”.

From a scientific perspective, all posted questions have been addressed promptly, normally within hours. Instead of feeling like a burden, maintaining the Forum and answering user questions have been a pleasure. I’d love to see more questions or posts on the Forum.

Comment

---

Characterization of H-type pseudoknots with DSSR

The v1.2.1 (2015feb01) release of DSSR contains a new functionality to characterize the so-called H-type pseudoknots. In this classical and most common type of pseudoknots, nucleotides from a hairpin loop form Watson-Crick base pairs with a single-stranded region outside of the hairpin to create another (adjacent) stem, as shown in the following illustration (taken from the Huang et al. paper A heuristic approach for detecting RNA H-type pseudoknots).

Schematic diagram the H-type pseudoknot

Normally, L2 is absent (i.e., with zero nucleotides) due to direct coaxial stacking of the two stems. An example output of DSSR on 1ymo (a human telomerase RNA pseudoknot) is shown below:

3D and secondary structures of an H-type pseudoknot (1ymo)

The corresponding sections from DSSR output are:

****************************************************************************
List of 3 H-type pseudoknot loop segments
   1 stem#1(hairpin#1) vs stem#2(hairpin#2) L1 groove=MAJOR nts=8 UUUUUCUC U7,U8,U9,U10,U11,C12,U13,C14
   2 stem#1(hairpin#1) vs stem#2(hairpin#2) L2 groove=----- nts=0
   3 stem#1(hairpin#1) vs stem#2(hairpin#2) L3 groove=minor nts=8 CAAACAAA C30,A31,A32,A33,C34,A35,A36,A37

****************************************************************************
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>1ymo-1-A #1 nts=47 [chain] RNA
GGGCUGUUUUUCUCGCUGACUUUCAGCCCCAAACAAAAAAGUCAGCA
[[[[[[........(((((((((]]]]]]........))))))))).

Checking against the three-dimensional image and the secondary structure in linear form shown above, the meaning of the new section should be obvious. If you want to see more details, click the link to the DSSR-output file on 1ymo.

Comment

---

Two more citations to DSSR

Recently I came across the following two citations to DSSR:

Base pair types were annotated with RNAview (45,46). Hydrogen bonds were annotated manually and with the help of DSSR of the 3DNA package (47,48). Helix parameters were obtained using the Curves+ web server (49). Structural figures were prepared using PyMol (50).

It is interesting to note that DSSR is cited here for its identification of hydrogen bonds, not its annotation of base pairs, among many other features. The simple geometry-based H-bonding identification algorithm, originally implemented in find_pair/analyze of 3DNA (and adopted by RNAView) and highly refined in DSSR, works well for nucleic acid structures. With the --get-hbonds option, users can now use DSSR as a tool just for its list of H-bonds outside of the program.

All figures were generated using PyMOL (60) or Chimera (48). The secondary structure diagram of the human mitoribosomal RNA was prepared by extracting base pairs from the model using DSSR (61). The secondary structure diagram was drawn in VARNA (62) and finalized in Inkscape.

I am very pleased to see that DSSR was cited for its ‘intended’ use in this important piece of work from a leading laboratory in structural biology. In the middle of last November (2013), I was approached by the lead author for proper citation of DSSR, and I suggested the two 3DNA papers. As far as I can remember, this was the first time I received such a question on DSSR citation. It prompted to write a FAQ entry in the DSSR User Manual, titled “How to cite DSSR?”. Hopefully, this citation issue will be gone in the near future.

Over the past two years, I’ve devoted significant efforts to make DSSR a handy tool for RNA structural bioinformatics; it certainly represents my view as to what a scientific software program should be like. As time passes by, DSSR is becoming increasingly sophisticated and citations to DSSR can only be higher.

Comment

---

Processing large structures in mmCIF format

Recently, PDB begins to release atomic coordinates of large (ribosomal) structures in mmCIF format. For nucleic-acid-containing structures, the largest one so far is 4v4g, the crystal structure of five 70S ribosomes from Escherichia coli in complex with protein Y. It is assembled from ten PDB entries (1voq, 1vor, 1vos, 1vou, 1vov, 1vow, 1vox, 1voy, 1voz, 1vp0), consisting of 22,345 nucleotides, and a total of 717,805 atoms.

This humongous structure poses no problems to DSSR at all, as shown below.

Command: x3dna-dssr -i=4v4g.cif -o=4v4g.out
Processing file '4v4g.cif' [4v4g]

total number of base pairs: 9277
total number of multiplets: 918
total number of helices: 1099
total number of stems: 1221
total number of isolated WC/wobble pairs: 603
total number of atom-base stacking interactions: 1736
total number of hairpin loops: 504
total number of bulges: 170
total number of internal loops: 775
total number of junctions: 214
total number of non-loop single-stranded segments: 429
total number of kissing loops: 5
total number of A-minor (type I and II) motifs: 100
total number of ribose zippers: 58 (1159)
total number of kink turns: 39

Time used: 00:00:10:45

It took less than 11 minutes to run on an iMac (and nearly 14 minutes on a Ubuntu Linux machine). Given the

Comment

---

DNA/RNA molecular dynamics trajectory analysis with do_x3dna

With great pleasure, I read the following annoancement from Rajendra Kumar on the 3DNA Forum:

Re: do_x3dna: a tool to analyze DNA/RNA in molecular dynamics trajectories 
« Reply #1 on: Today at 10:53:31 AM »

Hello,

I have now made a new website for do_x3dna
(http://rjdkmr.github.io/do_x3dna). This website contains detailed
documentation for do_x3dna program and Python APIs.

Documentation for Python API is now available
(http://rjdkmr.github.io/do_x3dna/apidoc.html).

Few tutorials about the Python APIs are also now available
(http://rjdkmr.github.io/do_x3dna/tutorial.html).

Thanks.

With best regards,
Rajendra

Browsing through the do_x3dna website, I am impressed by the extensive documentation and tutorial. Clearly, do_x3dna has pushed the boundaries (in applicability and documentation) of the x3dna_ensemble Ruby script distributed with 3DNA v2.1.

As noted in GitHub page, do_x3dna has been developed to analyze fluctuations in DNA or RNA structures in molecular dynamics (MD) trajectories. It can be used for GROMACS MD trajectories, as well as those from NAMD and AMBER. It leaves no doubt that do_x3dna will boost 3DNA’s applications in the increasingly active field of DNA/RNA MD simulations.

Comment [2]

---

List of modified nucleotides in DSSR output

From early on, 3DNA and DSSR have native support of modified nucleotides. The currently distributed baselist.dat file with 3DNA contains over 700 entries. As of v1.1.4-2014aug09, a new section has been added to DSSR to list explicitly the modified nucleotides in an analyzed structure.

Using the 76-nucleotide long yeast phenylalanine tRNA (1ehz) as an example, the pertinent section in DSSR output is as below.

List of 11 types of 14 modified nucleotides
      nt    count  list
   1 1MA-a    1    A.1MA58
   2 2MG-g    1    A.2MG10
   3 5MC-c    2    A.5MC40,A.5MC49
   4 5MU-t    1    A.5MU54
   5 7MG-g    1    A.7MG46
   6 H2U-u    2    A.H2U16,A.H2U17
   7 M2G-g    1    A.M2G26
   8 OMC-c    1    A.OMC32
   9 OMG-g    1    A.OMG34
  10 PSU-P    2    A.PSU39,A.PSU55
  11 YYG-g    1    A.YYG37

So 1ehz has 14 modified nucleotides of 11 different type, as listed in the following rows after the header line. The meaning of each column should be obvious. For example, the third row means that 5MC (5-methylcytidine, abbreviated as 'c' in 1-letter code) occurs twice, identified as A.5MC40 and A.5MC49, respectively.

With the 3-letter id, one can search the RCSB ligand database for more information about a specified modified nucleotide. The URL would be like this, using pseudouridine (PSU) as an example, https://www.rcsb.org/ligand/PSU.

It is hoped that the newly added section, put at the very top of DSSR output, will draw more attention to modified nucleotides.

Comment

---

« Older · Newer »

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu