G-quadruplex -- notes on ElTetrado and related tools

Over the past few years, the Szachniuk group has made several significant contributions to the field of structural bioinformatics of G-quadruplexes. The following five publications are particularly noteworthy, and I am glad to see that 3DNA/DSSR have been cited in all of them.

1. Zok et al. (2020) -- ElTetrado: a tool for identification and classification of tetrads and quadruplexes

The BMC Bioinformatics paper introduced the ElTetrado software tool for identifying and classifying G-tetrads in unimolecular G-quadruplex structures into ONZ taxonomy. Here DSSR is employed to identify base-pairs and base-stacking interactions.

ElTetrado processes PDB and mmCIF files to identify quadruplexes and their component tetrads in nucleic acid structures (Fig. 2). It applies DSSR [24] to collect the preliminary information about base pairs and stacking.

We recommend that, apart from ElTetrado, the users should download the DSSR binary [24] and place it in the same local directory. DSSR is utilized for the preliminary analysis of base pairs in the input 3D structure. Its local installation allows the users to control DSSR execution. For example, one can decide to pass --symmetry parameter to x3dna-dssr binary when dealing with X-ray structures, which is necessary for some quadruplexes.

As documented in the DSSR Manual, by default, DSSR reads in the first model of an NMR ensemble. A biological unit of X-ray crystal structures in the PDB may contain symmetry-related components formatted as a MODEL/ENDMDL delimited, NMR-like ensemble. In such cases, the --symmetry (or --symm) option is required for DSSR to process the entire biological unit.

For example, x3dna-dssr -i=4ms9.pdb1 --symm leads to the identification of 10 Watson-Crick base pairs in the biological unit of PDB entry 4ms9 (uploaded). The --symm option is now enabled for user-uploaded PDB files on the skmatic.x3dna.org website. Without the --symm option, DSSR would not find any Watson-Crick base pairs in 4ms9.pdb1 since MODEL#1 is single stranded.

Noticing the confusion users may have in using the --symm option, I have revised DSSR to check for overlapped residues. When all models in an NMR ensemble are taken as a whole with the --symm option, there will be overlapped residues. In such cases, DSSR will report a diagnostic message and proceed with the first model only. The final result is as if --symm has not been specified. Put another way, specifying --symm for an NMR ensemble does no harm to the analysis. For example, analyzing PDB entry 8xeq with the --symm option would have the following message and only the first model would be processed.

x3dna-dssr -i=8xeq.pdb --symm -o=8xeq.out
[i] You specified --symm, but the input file is an (NMR) ensemble
    *** in the following, only the FIRST model will be processed ***

Alternatively, if the users do not want to have a local version of DSSR binary, they can obtain the DSSR output in JSON format from any place and use them as input data for ElTetrado (with --dssr-json parameter).

ElTetrado is started from the command line. The users enter the program name and either --pdb followed by an input file name (the file should be in PDB or mmCIF format), or --dssr-json followed by a path to JSON file generated by DSSR, or both switches at once.

The JSON output from DSSR can be obtained directly from the skmatic.x3dna.org website for pre-processed PDB entries or user-supplied coordinate files. For examples, for PDB entry 1ehz, the URL is http://skmatic.x3dna.org/pdb/1ehz/1ehz.json. Alternatively, users can use the web API to get the JSON file, as shown below:

# Pre-processed PDB entry:
curl http://skmatic.x3dna.org/api/pdb/1ehz/json

# With user-supplied PDB file
curl http://skmatic.x3dna.org/api -F 'model=@1ehz.pdb' -F 'type=json'
curl http://skmatic.x3dna.org/api -F 'url=https://files.rcsb.org/download/1ehz.pdb.gz' -F 'type=json'

2. Popenda et al. (2020) -- Topology-based classification of tetrads and quadruplex structures

This paper presents the ONZ scheme to clarify tetrads in unimolecular structures (see Figure 2 therein). Note that DSSR, with option --G4=ONZ, classifies G-tetrads into ONZ taxonomy. For example, the PDB entry 2gku has one G-tetrad (G3, G9, G17, G21) in the O- category, and two G-tetrads in O+.

Structures from both sets were analyzed using self-implemented programs along with DSSR software from the 3DNA suite (Lu et al., 2015). From DSSR, we acquired the information about base pairs and stacking.

3. Zurkowski et al. (2022) -- DrawTetrado to create layer diagrams of G4 structures

DrawTetrado generates static layer diagrams that represent structural data in a pseudo-3D perspective. The layer diagram is very informative and visually pleasing, and it complements the cartoon block schematics generated by the DSSR-PyMOL integration and the detailed DSSR characterization of G-quadruplexes.

So far, the only visual model designed for the 3D structure of quadruplexes is cartoon block schematics (Fig. 1C). These models are generated by DSSR-PyMOL integration and presented as static images of the structure viewed from six perspectives (Lu, 2020).

4. Adamczyk et al. (2023) -- WebTetrado: a webserver to explore quadruplexes in nucleic acid 3D structures

The topologies underlying the classification of quadruplexes and other parameters of their structures can be analyzed using a few computational tools. DSSR (7) was the first to target the detection of G-quadruplexes in 3D structure data saved in PDB and PDBx/mmCIF files and to describe their features. It runs systematically on all entries in the Protein Data Bank and collects motifs found in the DSSR-G4DB database. ElTetrado (8) can identify and analyze G4s and other kinds of tetrads and quadruplexes, classify them, and compute their parameters. It is the core of the computation pipeline running within the ONQUADRO database system (9). The most recent tool for processing atom coordinates in the search for quadruplexes is ASC-G4 (10). It calculates more features than DSSR and ElTetrado, but is limited to unimolecular quadruplexes and supports only the PDB format.

The example illustration in Figure 3 of the paper is on PDB entry 6h1k, the major G-quadruplex form of HIV-1 LTR (long terminal repeat). The layer diagram shown below, re-generated using the WebTetrado website, helps visualize the detailed characterization of DSSR very nicely.

Layer diagram of PDB entry 6h1k "In DSSR, a G4-helix is defined by stacking interactions of G-tetrads, regardless of backbone connectivity, and may contain more than one G4-stem." For PDB entry 6h1k, DSSR identifies a G-helix with three G-tetrads, ordered properly. Specifically, strand#1 consists of (G2, G1, and G25), even G1 and G25 are not covalently connected. On the other hand, "In DSSR, a G4-stem is defined as a G4-helix with backbone connectivity. Bulges are also allowed along each of the four strands." Thus, the G4-stem is composed of only two G-tetrads, as detailed below.

Stem#1, 2 G-tetrads, 3 loops, INTRA-molecular, UDDD, hybrid-(mixed), 2(D+PX), UD3(1+3)

 1  glyco-bond=s--- sugar=---- groove=w--n Major-->WC Z- nts=4 GGGG A.DG1,A.DG20,A.DG16,A.DG27
 2  glyco-bond=-sss sugar=.-.3 groove=w--n WC-->Major Z+ nts=4 GGGG A.DG2,A.DG19,A.DG15,A.DG26
  step#1  mm(<>,outward)  area=12.76 rise=3.47 twist=18.2
  strand#1  U DNA glyco-bond=s- sugar=-. nts=2 GG A.DG1,A.DG2
  strand#2  D DNA glyco-bond=-s sugar=-- nts=2 GG A.DG20,A.DG19
  strand#3  D DNA glyco-bond=-s sugar=-. nts=2 GG A.DG16,A.DG15
  strand#4  D DNA glyco-bond=-s sugar=-3 nts=2 GG A.DG27,A.DG26
  loop#1 type=diagonal  strands=[#1,#3] nts=12 GAGGCGTGGCCT A.DG3,A.DA4,A.DG5,A.DG6,A.DC7,A.DG8,A.DT9,A.DG10,A.DG11,A.DC12,A.DC13,A.DT14
  loop#2 type=propeller strands=[#3,#2] nts=2 GC A.DG17,A.DC18
  loop#3 type=diag-prop strands=[#2,#4] nts=5 GACTG A.DG21,A.DA22,A.DC23,A.DT24,A.DG25

List of 2 non-stem G4-loops (including the two closing Gs)
 1 type=lateral   helix=#1 nts=5 GACTG A.DG21,A.DA22,A.DC23,A.DT24,A.DG25
 2 type=V-shaped  helix=#1 nts=4 GGGG A.DG25,A.DG26,A.DG27,A.DG28

DSSR correctly identifies the 12-nt diagonal loop, containing a canonical duplex stem and a hairpin loop. Notably, the G-tetrad (25-21-17-28) does not belong to the G4-stem because of the broken backbone connectivity between G1 and G25. Instead, the Gs in the G-tetrad (25-21-17-28) become part of the following two loops, which are certainly unconventional yet follow naturally the DSSR definition of G4-stem.

  • The propeller loop, which now includes G17 (part of the G-tetrad), in addition to C18.
  • The unusual diag-prop (diagonal-propeller ) loop, which consists of G21, A22, C23, T24, and G25.

Moreover, DSSR also reports two loops that are not defined by the G4-stem: the V-shaped loop (G25-G26-G27-G28) and the lateral loop (G21-A22-C23-T24-G25). See the notes in the above layer diagram. V-shaped loop occurs when the 5’-endmost G-tetrad lies in the middle of the G-quartets stack as in the non-canonical G4 structures with snapbacks.

The G4-helix and G4-stem definitions parallel those for duplex helix and stem in DSSR. The characterization of loops follows naturally once G4-stem or duplex stem are identified. The unusual propeller and diagonal-propeller loops noted above are due to non-canonical structures, which also lead to the listing of non-stem G4-loops.

I may consider to add special handling of snapbacks (or other worthwhile classes of non-canonical G4 structures) so that the reported loops follow whatever consensus the community agrees upon in the future. Nevertheless, I would like to emphasize that the consistent definitions of G4-stem and loops in DSSR help pinpoint extraordinary features to draw users' attention to non-canonical G4 structures. The layer diagram from DrawTetrado and WebTetrado are very handy in illuminating the basic concept and technical details, as shown here for PDB entry 6h1k.

5. Zok et al. (2022) -- ONQUADRO: a database of experimentally determined quadruplex structures

The computational engine is composed of scripts utilising in-house and third-party procedures, responsible for data collection, quadruplex identification, computation of structure parameters, secondary structure annotation, visualisation of the secondary and tertiary structure models, database queries, generation of statistics, and newsletter preparation. DSSR (--pair-only mode) (36) and ElTetrado (39) functionalities are applied to identify quadruplexes, tetrads, and G4-helices in nucleic acid structures.


References

Adamczyk, B., Zurkowski, M., Szachniuk, M., & Zok, T. (2023). WebTetrado: a webserver to explore quadruplexes in nucleic acid 3D structures. Nucleic Acids Research, 51(W1), W607–W612. https://doi.org/10.1093/nar/gkad346

Popenda, M., Miskiewicz, J., Sarzynska, J., Zok, T., & Szachniuk, M. (2020). Topology-based classification of tetrads and quadruplex structures. Bioinformatics, 36(4), 1129–1134. https://doi.org/10.1093/bioinformatics/btz738

Zok, T., Kraszewska, N., Miskiewicz, J., Pielacinska, P., Zurkowski, M., & Szachniuk, M. (2022). ONQUADRO: a database of experimentally determined quadruplex structures. Nucleic Acids Research, 50(D1), D253–D258. https://doi.org/10.1093/nar/gkab1118

Zok, T., Popenda, M., & Szachniuk, M. (2020). ElTetrado: a tool for identification and classification of tetrads and quadruplexes. BMC Bioinformatics, 21(1), 40. https://doi.org/10.1186/s12859-020-3385-1

Zurkowski, M., Zok, T., & Szachniuk, M. (2022). DrawTetrado to create layer diagrams of G4 structures. Bioinformatics, 38(15), 3835–3836. https://doi.org/10.1093/bioinformatics/btac394

---

Comment

 
---

·

Thank you for printing this article from http://x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu