By following DSSR citations, I recently noticed a bioRxiv preprint, titled "Assessment of nucleic acid structure prediction in CASP16" by Kretsch et al. The portion where DSSR is mentioned is as follows:
Secondary structures were extracted from CASP16 models with DSSR (v1.9.9-2020feb06). Some models, in particular due to large clashes, failed to run (Supplemental Table 1). The base-pair list was extracted from the table in the output file directly because the dot-bracket structure produced by DSSR, in particular for multimers, can contain errors.
While pleased to see DSSR cited in this significant study, I am concerned about the reported issues and would like to investigate the specific structures and error messages encountered. To better understand the problems and potentially find solutions, I have reached out to the authors for further details. Here is the message I sent initially:
You said DSSR failed to run on some models with large clashes. Could you please share the specific models and the error messages you encountered? I would also be interested in seeing the exact errors you observed in the DSSR-derived DBN for multi-mers. It would be a great opportunity for me to improve DSSR in this area, which would benefit both your group and the broader community. If you are willing to share them, please provide details—preferably on the public 3DNA Forum. Don’t hesitate to share openly any bugs or limitations you’ve encountered with DSSR.
The authors responded promptly and provided detailed information about the specific models and error messages encountered. After several iterations, I successfully resolved the issues and released an updated version of DSSR, namely v2.5.4-2025jun04. You can find the release notes here. In this blog post, I aim to share the specifics of these issues and the steps taken to address them. This experience underscores the importance of proactively engaging with the community to enhance the functionality and reliability of a software tool.
Do note, the predictors in casp submit some truly atrocious models --- eg 14 atoms all at the exact same x-y-z coordinate. These errors would be with his v1.9.9-2020feb06 install though not your latest version. Would you still like them? --- from the authors
Yes, I would like to see how DSSR behaves with these models. Ideally, it should not crash, but output some warning messages. Only through such testing can we improve the robustness of DSSR. Overall, the more feedback I get, the better.
Buffer overflow bug in DSSR
Most of errors I had with dssr were due to clashes and all zero xyz predictions by predictors, for all of which dssr did not give an error message when dssr failed. There was a case where the prediction looked reasonable but dssr failed with the error message
dssr error*** buffer overflow detected ***
. Please see attached for the 2 pdbs that gave this error. --- from the authors
The two PDB files I received were R1283v3TS294_1o
and R1283v3TS294_2o
, as listed in Supplementary Table 1: "List of unscored models," with the "Reasons" column indicating a dssr error*** buffer overflow detected ***
. I immediately acknowledged receipt of these files, as shown in the following message:
Thank you for sending me the two PDB files which caused DSSR to fail. I can verify the issue and will try to fix the bug ASAP. I'll keep you posted.
Using these data files, I was able to quickly fix the buffer overflow bug. The following is my response to the authors within one day after receiving the files:
With your sample PDB files, I have traced the issue that caused DSSR to fail. The bug was due to a 53-way (
R1283v3TS294_1o
) and 40-way(R1283v3TS294_2o
) junction loops which are far from the norm. DSSR sets a default limit for the summary line for each loop which is more than sufficient for all normal PDB entries, but falls short for these unusual cases, leading to out of array boundaries. See the attached DSSR output after the bug fix for more details.This is a clear example where user feedback is crucial for improving the software, which makes it better serve the community.
Zero xyz coordinates and large clashes
After fixing the out-of-bound bug, I also requested other problematic predicted models from the authors, as shown in the following message:
Along the line, please provide the sample PDB files:
- with zero xyz predictions -- I am curious to see what it looks like.
- where the DSSR-derived DBN is problematic for multi-mers
After solving these issues, I will release a new version of DSSR that would make your analysis more straightforward, and benefit other users as well.
The authors responded with the following message:
Thanks for looking into this. Here are some more examples with superimposed structures, large clash, and all zero xyzs in the zip file.
The ZIP file (error_examples.zip
) contains three folders (all_zero_xyz
, clash
and superimposed
), each with some problematic models in PDB format. Once again, I promptly acknowledged receipt of the files and was able to reproduce the reported issues.
Garbage in, garbage out. Given these problematic models, DSSR would not be able to extract any meaningful information from them. Nonetheless, I am committed to enhancing the software so that it can handle such cases more effectively by providing clear error messages and terminating gracefully rather than crashing.
After several days of intensive coding and testing, I developed a solution, which I communicated to the authors in the following message:
Thanks for the sample PDB files (
error_examples
) with all zero XYZ coordinates, large clashes, and superimposed structures. They helped me to understand the issues, think in context, and find solutions. Let's look them one by one:
all_zero_xyz
: These two filesR1211TS159_1
andR1211TS159_2
have identical contents, except for the MODEL IDs (1 and 2, respectively). Atoms with all-zero XYZ coordinates are a special case of duplicated coordinates. This has led me to implement a check for duplicated coordinates in an input file. The revised DSSR now reports duplicated coordinates and their corresponding atoms, and it quits if the number of duplicated atoms exceeds a certain threshold. ForR1211TS159_1
, the revised DSSR output would be as below:
1 [e] xyz repeated 1904 times:[0.000 0.000 0.000] 1509-P@0.G1 3412-C6@0.C90
[w] no-of-repeats=1 max-freq=1904
...too many duplicates... quit!
clash
: Both filesR1250TS208_1o
andR1250TS417_1o
contain multiple models, as visible in PyMOL. Each PDB file uses a single MODEL/END pair to include all its models. This setup is akin to an NMR ensemble but without MODEL/ENDMDL delimiters, which leads to clashes when analyzed together. I have revised DSSR to explicitly check for such clashes and terminate execution if too many are detected. UsingR1250TS208_1o
as an example, the DSSR output would be as below:
[i] 0.G1 and 1.G1 in clashes: min_dist=0.57
[i] 0.G1 and 3.G1 in clashes: min_dist=0.35
[i] 0.G1 and 4.G1 in clashes: min_dist=0.41
...too many clashes... quit!
The above list contains only three of the many clashes detected in this file. One can notice immediately the G1 nucleotides from chains
0
,1
,3
, and4
are in clashes (see the attached fileclashes_208.pdb
, which contains only G1 nucleotides from the four chains).
superimposed
: The five example files (R1283v3TS304_1o
...R1283v3TS304_5o
) have similar issues as the clash cases. Running the revised DSSR onR1283v3TS304_1o
would produce the following output:
[i] 0.A1 and 2.A1 in clashes: min_dist=0.74
[i] 0.A1 and 3.A1 in clashes: min_dist=0.78
[i] 0.A1 and 4.A1 in clashes: min_dist=0.56
...too many clashes... quit!
Here A1 nucleotides from chains
0
,2
,3
, and4
are in clashes (see the attachedsuperimpose-1.pdb
).How the
clash
andsuperimposed
categories are supposed to be different? They look similar to me.Overall, the
error_examples
(inall_zero_xyz
,clash
, andsuperimposed
) pose problems because they do not contain valid DNA/RNA structures as a whole. DSSR cannot extract meaningful information from these files. However, the revised DSSR explicitly highlights these issues, saving users from spending time on invalid data. Do these DSSR revisions make sense to you?
In the end, I am glad to receive the following feedback from the authors:
Thanks, these revisions all make sense! The examples I sent on clashes and superimposed were actually similar and I think the error output makes sense as well. --- from the authors
References
Kretsch,R.C. et al. (2025) Assessment of nucleic acid structure prediction in CASP16. bioRxiv; https://doi.org/10.1101/2025.05.06.652459.