G-quadruplex -- notes on the Webba da Silva nomenclature

In late September of 2018, I contacted Dr. Mateus Webba da Silva requesting a copy of his 2007 article, titled "Geometric formalism for DNA quadruplex folding". At that time, I had implemented a G4 module within DSSR for the automatic identification, comprehensive annotation, and schematic visualization of G-quadruplexes from 3D atomic coordinates. I noticed the 2007 paper, and was intrigued by the following sentences in the abstract:

A formalism is presented describing the interdependency of a set of structural descriptors as a geometric basis for folding of unimolecular quadruplex topologies. It represents a standard for interpretation of structural characteristics of quadruplexes, and is comprehensive in explicitly harmonizing the results of published literature with a unified language.

Mateus kindly sent me a copy of the 2007 article, and shortly afterwards he also shared with me the Dvorkin et al. (2018) paper on "Encoding canonical DNA quadruplex structure". I carefully read both papers, plus the Karsisiotis et al. (2013) tutorial paper. I was impressed by the elegance of the formalism: simple and systematic, so I immediately decided to add this feature to the G4 module of DSSR.

As the Chinese saying goes, "纸上得来终觉浅,绝知此事要躬行" ("What you learn from books is always shallow. You must practice it yourself to know it well." -- Google Translate). The implementation process was challenging because of subtleties in the formalism, but very rewarding. It is all about scientific understanding and software engineering. Only after a thorough understanding and attention to meticulous details can one create a robust and reliable software tool. On the other hand, once properly implemented, the DSSR G4 module can be applied consistently. Any discrepancies between DSSR output and literature merit further investigation. These discrepancies could either arise from bugs in DSSR (which I will promptly address upon identification) or, more likely, typos or errors in the reported results.

Webba da Silva (2007) systematically described the interdependency of glycosidic bond (syn or anti), strand polarity (parallel or anti-parallel), groove width (narrow, medium, or wide), and loop type (lateral, propeller, or diagonal) in unimolecular G-quadruplexes. Figures 1-3 and Scheme 1 of Webba da Silva (2007) are very informative, and easy to follow conceptually. The Karsisiotis et al. (2013) tutorial provided further details based on experimentally determined G-quadruplex structures from the PDB (e.g., Figure 3: the schematic for all possible combinations of glycosidic bond and the corresponding groove-width combinations in G-tetrad). Some key observations:

  • Since glycosidic bond can be either syn or anti, there a total of 2x2x2x2 = 16 possible combinations in a G-tetrad.
  • The disposition of glycosidic bond of guanosines in a G-tetrad leads to only eight possible groove-width combinations.
  • Only tetrads with the same groove-width combinations may stack to form stable G-quadruplexes.
  • Propeller loops invariably link medium grooves within a G-quadruplex stem.
  • Lateral and diagonal loops bridge guanosines of different glycosidic bond.
  • If a single-stranded quadruplex starts with a narrow groove, it can only be with a clockwise loop progression (i.e., +lateral).
  • There are 26 permissible looping combinations within a canonical unimolecular G-quadruplex (G4-stem).

To unambiguously characterize a G4-stem, Webba da Silva (2007) defined a frame of reference where the 5’-G in a G4-stem is set as the origin, and the first strand is progressing towards the viewer. Regardless of the clockwise or anti-clockwise progression of the base sequence, the scheme designates one orientation for the syn and anti glycosidic bond by following G+G H-bonding alignments. Put another way, grooves and strands are strictly related to the reference (first) strand in an anti-clockwise manner, irrespective of the progression of the base sequence. The point is illustrated in the figure below, using PDB entries 8ht7 (G1 in syc) and 5ua3 (G1 in anti) as an example for the syn or anti glycosidic bond of the 5'-guanosine, respectively.

G4 frame of references

Based on previous work, the Dvorkin et al. (2018) paper proposed a systematic nomenclature for G4-stem. The single structural descriptor contains:

  • The number of G-tetrads (i.e., the G-tract length).
  • Loop types (lowercase l for lateral, p for propeller, and d for diagonal) and relative direction ("+" for clockwise and "-" for anti-clockwise progression, using the frame of reference described above).
  • For lateral loops, the groove widths ("w" for wide, and “n” for narrow) are denoted in subscript.

So a complete descriptor could be 2(+lnd−p), as shown in Figure 1A of the Dvorkin et al. (2018) paper. Significantly, Figure 1B therein further gave structural descriptors for six experimentally determined G4-stems from the PDB. These examples, plus the ones in the supplementary materials, were used to validate my implementation of the systematic nomenclature in the G4 module of DSSR. My results agree with those in the Dvorkin et al. (2018) paper, except for two cases, which are discussed below.

  • For PDB entry 2gku: 3(-p-ln-lw) (Dvorkin et al.) vs 3(-P-Lw-Ln) (DSSR), with swapped n (narrow) and w (wide) groove widths for both lateral loops.
  • For PDB entry 2lod: 3(-pd+ln) (Dvorkin et al.) vs 3(-PD+Lw) (DSSR), with swapped n (narrow) and w (wide) groove width for the lateral loop.

Note that in DSSR, I am using uppercase L/P/D for lateral/propeller/diagonal loop types, and lowercase n/w for narrow/wide groove widths, respectively. Doing so distinguishes between the different loop types and groove widths in pure text format.

After careful examination of these discrepancies, I still couldn’t find any errors in my implementation. So I contacted Mateus for verification (in early October 2018). Thankfully, he quickly responded and acknowledged the mistakes for PDB entry 2gku in Dvorkin et al. (2018), saying "There can not be a –Ln after the –p." Clearly, the wrong descriptor for PDB entry 2gku in Dvorkin et al. (2018) was due to a typographical error. This example illustrates the power of a robust software tool like DSSR.


References

Dvorkin,S.A. et al. (2018) Encoding canonical DNA quadruplex structure. Sci. Adv., 4, eaat3007.

Karsisiotis,A.I. et al. (2013) DNA quadruplex folding formalism – A tutorial on quadruplex topologies. Methods, 64, 28–35.

Webba da Silva,M. (2007) Geometric formalism for DNA quadruplex folding. Chemistry A European J, 13, 9738–9745.

---

Comment

 
---

·

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu