In January 29, 2025, I received the following email request from a long-time DSSR user:
... recently noted that 3DNA/DSSR automatically maps non-standard nucleotides to standard nucleotides. I wonder if you would be willing to share with us your most current version of mappings?
I responded to the user the same day, with detailed information about the mapping process in DSSR. The user was happy with my response, and that thread was quickly closed with a positive note.
On April 22, 2025, a related question, titled "Can x3dna-dssr correctly handle N1-methyl-pseudouridine?", was asked on the 3DNA Forum. In answering the question on the Forum, I referred to my email response to the previous user.
I now realize that writing a detailed blog post explaining the mapping process would be beneficial for DSSR users. It would also enable me to easily reference this blog post in future interactions with users.
3DNA/DSSR performs automatic mapping of modified nucleotides (including pseudouridine) to their standard counterparts. Over the years, the method has proven to work well in real-world applications. It is one of the defining features that make DSSR just work. For example, for the tRNA 1ehz, DSSR automatically identifies the following 14 modified nucleotides (of 11 unique types):
# x3dna-dssr -i=1ehz.pdb
List of 11 types of 14 modified nucleotides
nt count list
1 1MA-a 1 A.1MA58
2 2MG-g 1 A.2MG10
3 5MC-c 2 A.5MC40,A.5MC49
4 5MU-t 1 A.5MU54
5 7MG-g 1 A.7MG46
6 H2U-u 2 A.H2U16,A.H2U17
7 M2G-g 1 A.M2G26
8 OMC-c 1 A.OMC32
9 OMG-g 1 A.OMG34
10 PSU-P 2 A.PSU39,A.PSU55
11 YYG-g 1 A.YYG37
Users could run DSSR on a set of structures of interest, and collect the unique mappings for a complete list of modified nucleotides.
Moreover, DSSR has the --nt-mapping
option that allows users to control the mapping process. The screenshot below is taken from the relevant part of the DSSR manual.
For example, DSSR automatically maps 5MU
(5-methyluridine 5′-monophosphate) to t
(i.e., modified thymine) because of the 5-methyl group. With the option --nt-mapping='5MU:u'
, DSSR would take 5MU as a modified uracil. This option allows for multiple mappings separated by comma. The mapping of 5MU
to u
or t
is obviously arbitrary. DSSR is robust against the ambiguity in designating a modified nucleotide to its nearest canonical counterpart. For example, mapping 5MU
to u
or t
has minimal influence on DSSR-derived base-pair parameters and other structural features.
Background information on the mapping
Over the years, I've refined the heuristics of the mapping process. In the early days with 3DNA, I kept an ever increasing list in file baselist.dat
with hundreds of entries like: MIA a
that maps MIA as a modified A, denoted as lowercase a
. In recent releases of DSSR, I keep only the standard ones, with a total of 48 entries like ADE A
, and DG5 G
etc. If a residue is not a standard one, the following C function is called to do the mapping. DSSR performs filtering to decide if a residue is a nucleotide, and if so R (purine) or Y (pyrimidine).
static void derive_new_nt_std_name(long resi, struct_mol *pdb, char *info)
{
char str[BUF512];
double d1 = DMAX, d2 = DMAX;
long C1_prime, N1, C5;
struct_residue *r = &pdb->residues[resi];
if (r->type[RESIDUE_NT_UNKNOWN]) {
sprintf(r->std_name, "__%c", Gvars.abasic);
return;
}
if (is_R(resi, pdb)) { /* purine */
if (residue_has_atom(" O6 ", resi, pdb)) /* with ' O6 ' */
strcpy(r->std_name, "__g");
else if (!residue_has_atom(" N6 ", resi, pdb) && /* no ' N6 ' but ' N2 ' */
residue_has_atom(" N2 ", resi, pdb))
strcpy(r->std_name, "__g");
else
strcpy(r->std_name, "__a");
} else { /* a pyrimidine */
if (residue_has_atom(" N4 ", resi, pdb))
strcpy(r->std_name, "__c");
else if (residue_has_atom(" C7 ", resi, pdb))
strcpy(r->std_name, "__t");
else
strcpy(r->std_name, "__u");
C1_prime = find_atom_in_residue(" C1'", resi, pdb);
N1 = find_atom_in_residue(" N1 ", resi, pdb);
if (atoms_same_model_chain_altloc(C1_prime, N1, pdb))
d1 = dist_atoms(C1_prime, N1, pdb);
if (!dval_in_range(d1, 1.0, 2.0)) {
C5 = find_atom_in_residue(" C5 ", resi, pdb);
if (atoms_same_model_chain_altloc(C1_prime, C5, pdb))
d2 = dist_atoms(C1_prime, C5, pdb);
if (dval_in_range(d2, 1.0, 2.0))
strcpy(r->std_name, "__p");
}
}
if (!Gvars.standalone) {
sprintf(str, "\n\tmatched nucleotide '%s' to '%c' for %s\n"
"\tverify and add an entry in <baselist.dat>\n",
r->res_name, r->std_name[2], info);
logit(str);
}
}