RosettaFold-3: Addressing Key Failure Modes in Diffusion-Based Biomolecular Modeling

Written by Danial Gharaie Amirabadi | Published 2026-2-19

Introduction: Diffusion Models Have Expanded the Scope of Structure Prediction

Fig 1. Introduction

AtomWorks pipeline overview showing how a unified data framework supports RF3 and other biomolecular models. Taken from Corley et al., “Accelerating Biomolecular Modeling with AtomWorks and RF3,” Figure 1.

Accurate biomolecular structure prediction has become a central capability in modern structural biology and drug discovery. While AlphaFold2 demonstrated that deep learning can reliably predict folded protein structures, many biologically and therapeutically relevant systems involve interactions beyond single protein chains. These include protein-protein assemblies, protein-ligand complexes, antibody-antigen systems, and interactions involving DNA and RNA.

AlphaFold3 represented a major step forward by extending structure prediction to general biomolecular complexes using a diffusion-based framework. This shift has established diffusion models as one of the most effective paradigms for modeling heterogeneous molecular systems. However, reproducing and extending these methods in open-source settings remains difficult. In practice, the challenges are not limited to model architecture. A significant portion of the difficulty comes from the complexity of biomolecular training data, including inconsistent annotations, edge cases in the Protein Data Bank, and the need for robust preprocessing pipelines that can correctly handle ligands, covalent modifications, stereochemistry, and missing structural information.

To address these limitations, Corley et al. introduce AtomWorks, a modular and research-oriented data framework designed to support scalable training of biomolecular foundation models. Using AtomWorks, they train RosettaFold-3 (RF3), an all-atom structure prediction model designed to improve performance on diverse biomolecular complexes while explicitly addressing known weaknesses of diffusion-based predictors, particularly stereochemical errors and limited controllability.

RF3 is positioned as both a practical open-source structure prediction tool and as a demonstration of how improved data infrastructure can accelerate the development of future biomolecular modeling systems.

Why Diffusion-Based Structure Predictors Still Fail in Practice

Diffusion-based models have become the dominant approach for modeling biomolecular structures because they can generate full atomic coordinates while naturally supporting flexible, probabilistic sampling. In principle, this makes them well-suited for complex systems such as protein-ligand interactions, multimeric assemblies, and nucleic acid complexes. In practice, however, diffusion-based structure predictors still exhibit failure modes that limit their reliability in real-world scientific and drug discovery workflows.

A major issue is that many diffusion models do not consistently enforce chemical validity. Small geometric deviations that might appear minor at the coordinate level can correspond to chemically incorrect structures. This is especially problematic for ligands, where stereochemistry and local geometry determine biological activity. Incorrect chirality, unrealistic bond geometries, and distorted conformations can make a predicted complex unusable for downstream applications such as docking validation, binding affinity estimation, or structure-guided design.

A second limitation is the lack of controllability. Many workflows require incorporating prior structural information, such as known ligand conformers, experimentally derived restraints, or partial holo structures. Without a mechanism to condition predictions on these constraints, diffusion models often produce plausible but misaligned solutions, particularly in binding site regions where small errors can dominate functional interpretation.

Finally, the difficulty of building robust open-source pipelines remains an underappreciated bottleneck. Training diffusion models on heterogeneous biomolecular data requires extensive preprocessing to handle edge cases in the Protein Data Bank, including inconsistent bond annotations, missing atoms, alternate conformations, and covalent modifications. Without standardized and well-tested infrastructure, model development becomes slow and difficult to reproduce, even when architectures are publicly available.

These limitations define the gap between diffusion models that generate visually plausible structures and models that consistently produce chemically faithful predictions that can be trusted in practical applications. RF3 is designed explicitly around these failure modes, focusing on stereochemical correctness, user conditioning, and data pipeline robustness as first-class requirements.

RF3 Improvement #1: Chirality as a First-Class Training Signal

Fig 2. Chirality

RF3 chirality benchmarking results and example prediction showing improved stereochemical correctness compared to AF3 and Boltz. Taken from Corley et al., “Accelerating Biomolecular Modeling with AtomWorks and RF3,” Figure 2.

Stereochemistry is not a secondary detail in biomolecular modeling. For small molecules and modified residues, the handedness of a chiral center can determine binding mode, potency, and selectivity. A structure that is globally accurate but locally inverted at key stereocenters may be unusable for downstream applications such as docking validation, binding affinity modeling, or structure-based design.

Diffusion-based structure predictors are particularly vulnerable to chirality errors. Because the denoising process operates on continuous coordinates, mirror-image solutions can satisfy geometric constraints while remaining chemically incorrect. In many models, stereochemistry is not explicitly encoded as a training signal, which increases the likelihood of incorrect handedness in ligands and mixed chirality peptides.

RF3 addresses this issue by representing chirality geometrically during training and inference. Instead of relying solely on categorical labels such as R and S, RF3 encodes chirality using signed dihedral geometry around chiral centers. During each denoising step, the model receives a feature proportional to the gradient of the error between the current noisy structure and the ideal chiral angle. This effectively provides the network with directional information about how to correct stereochemical deviations as it refines coordinates.

In addition, RF3 introduces a training-time augmentation in which chirality is inverted in a small fraction of examples. This encourages the model to learn to distinguish and correct stereochemical configurations rather than passively reproduce them.

Benchmark results indicate that this approach improves stereochemical fidelity. On a test set of ligand-containing complexes, RF3 predicts the correct chirality for 88 percent of ligand chiral centers, compared to 84 percent for AlphaFold3 and 76 percent for Boltz-2 without inference-time guidance. On a set of mixed L and D macrocyclic peptides deposited after the training cutoff of all methods, RF3 predicts 86 percent of chiral centers correctly, compared to 70 percent for AlphaFold3.

These results suggest that explicitly incorporating stereochemical geometry into the denoising process improves chemical correctness without requiring inference-time steering or post hoc corrections.

RF3 Improvement #2: Constraint-Based Atom-Level Conditioning

Fig 3. Conditioning

Examples of RF3 atom-level conditioning, including ligand conformer templating and improved docking performance. Taken from Corley et al., “Accelerating Biomolecular Modeling with AtomWorks and RF3,” Figure 3.

In many structure prediction workflows, the objective is not to generate an unconstrained complex from sequence alone. Practitioners often have partial structural information that should guide the prediction process. This may include experimentally derived restraints, known ligand conformations, or holo structures where only part of the system is unknown. Without a mechanism to incorporate these priors, diffusion-based predictors may generate plausible structures that fail to reproduce key geometric relationships in the binding site.

RF3 introduces a flexible conditioning mechanism that allows users to specify desired distances between arbitrary atoms. These constraints can be used to enforce known interactions, guide docking against a fixed ligand conformer, or fold a protein around a predefined small-molecule geometry. This provides a direct method for incorporating structural priors into the diffusion process rather than relying solely on the model’s learned distribution.

The RF3 paper evaluates this capability in two settings: folding proteins around a rigid ligand conformer and constraining predictions using distances derived from a holo structure. In both cases, conditioning improves protein-ligand interface accuracy. When folding around a rigid ligand, median interface accuracy improves from 0.821 to 0.882. When constraining distances from the holo structure, accuracy improves from 0.821 to 0.890. The model also closely preserves the provided ligand conformation, achieving a median ligand-only lDDT of 0.991.

This type of atom-level conditioning is particularly relevant for drug discovery applications, where ligand conformer correctness and binding-site geometry are often more important than global fold accuracy.

RF3 Improvement #3: Modeling Disordered Regions Without Hallucinated Structure

A persistent challenge in biomolecular structure prediction is the treatment of unresolved or intrinsically disordered regions. Many experimentally determined structures contain missing segments that are not resolved in electron density or cryo-EM maps. In these cases, some predictive models tend to generate artificially compact helices or secondary structure elements that are not supported by experimental evidence. While such predictions may appear structurally reasonable, they can be misleading in mechanistic interpretation and downstream modeling.

RF3 addresses this by incorporating a disordered distillation dataset into training. Instead of relying on repredictions of the PDB from another model, the authors generate extended backbone conformations for disordered regions using Rosetta. These extended conformations are used as training targets in a subset of cases, encouraging RF3 to represent unresolved regions in a way that more closely matches their expected physical behavior.

The model is trained with extended disordered regions in approximately 2 percent of PDB-derived examples. This strategy is intended to reduce the tendency of diffusion predictors to hallucinate structured elements in regions where flexibility is more realistic.

Although disorder modeling is not typically captured by standard benchmark metrics such as lDDT, it represents an important qualitative improvement for real-world modeling scenarios, especially for proteins with flexible linkers, regulatory regions, or partially resolved binding interfaces.

RF3 Improvement #4: AtomWorks as Infrastructure for Reliable Open Biomolecular Modeling

While improvements in architecture and conditioning are important, the RF3 paper argues that one of the largest barriers to progress in open biomolecular modeling is not the neural network itself. It is the difficulty of constructing robust and reusable data pipelines. Training on biomolecular structures requires handling numerous edge cases in the Protein Data Bank, including inconsistent annotations, missing coordinates, incorrect bond definitions, alternate occupancies, and chemically complex ligands. These challenges create significant engineering overhead and often lead to model-specific pipelines that are difficult to reproduce or extend.

AtomWorks is introduced as a general framework designed to standardize these processes. It emphasizes high-quality preprocessing and consistent atom-level representations across datasets. The framework explicitly handles issues such as leaving group removal, bond order correction, charge fixing, covalent geometry parsing, coordinate imputation, and appropriate treatment of multi-occupancy structures and symmetry-related ligands.

A key design principle of AtomWorks is modularity. Instead of building a monolithic featurization pipeline that directly produces model tensors, AtomWorks uses a transform-based system operating on a shared atom-level structural representation based on Biotite’s AtomArray. This allows operations such as cropping, conformer generation, MSA loading, chirality feature construction, and conditioning to be implemented as reusable components.

The paper reports that this modular design enables extensive code reuse across multiple biomolecular models, with more than 80 percent of code shared across networks including RF3, RF All-Atom, ProteinMPNN, and LigandMPNN. This addresses a recurring limitation of open-source biomolecular modeling, where model development is often slowed by duplicated effort in dataset handling and feature engineering.

In this context, RF3 is presented not only as a structure prediction model, but also as a demonstration of how improved infrastructure can make high-quality biomolecular foundation modeling more accessible and more reproducible.

Performance: How Close Does RF3 Get to AlphaFold3?

While RF3 introduces several methodological improvements aimed at chemical fidelity and controllability, it is also important to evaluate how these translate into overall predictive accuracy across biologically relevant tasks. The RF3 paper benchmarks performance against AlphaFold3 and Boltz on a diverse test set of recent Protein Data Bank structures, covering protein-protein interactions, protein-ligand complexes, nucleic acid interfaces, RNA-only structures, and other heterogeneous assemblies.

Fig 4. Benchmark Performance

Benchmark comparison of RF3 against AlphaFold3 and Boltz across diverse biomolecular interaction categories. Taken from Corley et al., “Accelerating Biomolecular Modeling with AtomWorks and RF3,” Figure 4.

Across most benchmark categories, RF3 performs consistently between AlphaFold3 and existing open-source alternatives. This suggests that RF3 narrows the performance gap relative to AlphaFold3 while remaining competitive with the best available open models. In particular, the authors report that extending the training cutoff from September 2021 to January 2024 improves accuracy across the board. For example, median protein-protein interface lDDT increases from 0.571 to 0.607, and median protein-ligand interface lDDT increases from 0.766 to 0.798. Performance on protein-DNA interfaces also improves substantially, with median lDDT increasing from 0.415 to 0.523.

Antibody-antigen prediction is evaluated separately due to its practical importance in therapeutic development and its distinct structural challenges. On a clustered, de-leaked antibody-antigen benchmark, RF3 achieves DockQ greater than 0.23 on 33 percent of examples. This compares to 44 percent for AlphaFold3, 22 percent for Boltz-2, and 28 percent for Chai-1. These results indicate that RF3 provides meaningful improvements over other open models in this setting, although AlphaFold3 remains the strongest performer.

Overall, the benchmark results suggest that RF3 delivers strong general-purpose accuracy across diverse biomolecular complexes, with particularly notable gains in areas where stereochemistry and ligand handling are critical.

RF3 vs Boltz vs AlphaFold3: Practical Positioning

RF3 enters a landscape that is already shaped by AlphaFold3 and rapidly improving open-source alternatives such as Boltz. While these systems share a diffusion-based modeling paradigm and similar high-level goals, they differ substantially in licensing, controllability, stereochemical fidelity, and downstream usability.

AlphaFold3 remains the strongest overall baseline for general biomolecular structure prediction, particularly due to its extensive training regime and engineering maturity. However, its restrictive licensing limits its adoption in commercial workflows. Open-source models have therefore become essential for many organizations that require deployable structure prediction systems.

Among open models, Boltz has established itself as a strong general-purpose predictor, with Boltz-2 further extending the paradigm by incorporating binding affinity prediction. RF3 is positioned differently. Rather than focusing on affinity regression, RF3 emphasizes chemical correctness and extensibility. Its most distinctive contributions are stereochemistry-aware diffusion, atom-level conditioning, and the AtomWorks framework for scalable model development.

A practical comparison is summarized below:

Capability	RF3	Boltz-2	AlphaFold3
Open-source availability	Yes	Yes	No
Commercial usability	Yes	Yes	Restricted
General complex prediction	Strong	Strong	Strongest
Chirality handling	Explicitly trained	Improved, but often guided	Strong, but not always reliable
Constraint-based conditioning	Atom-level distance conditioning	Steering and template-based controls	Limited public control
Binding affinity prediction	No	Yes	No
Framework for training new models	AtomWorks	Not designed for reuse	Not public

In practice, RF3 is particularly well-suited for use cases where stereochemical correctness and user conditioning are central requirements, including chiral ligands, mixed chirality peptides, and docking-like workflows where conformers or distance constraints must be respected. Boltz-2 is more appropriate when binding affinity prediction is required in addition to structural modeling. AlphaFold3 remains the strongest reference baseline for pure structure prediction, but its licensing limits its applicability for many commercial settings.

Key Takeaways

RF3 improves stereochemical correctness by explicitly incorporating chirality signals into the diffusion denoising process.
RF3 enables controllable structure prediction through atom-level distance conditioning, improving usability for docking and experimentally guided modeling workflows.
RF3 narrows the open-source performance gap with AlphaFold3 across diverse biomolecular complexes, while remaining fully open and commercially usable.
AtomWorks provides reusable infrastructure that reduces the engineering burden of training and extending biomolecular foundation models.

Try the RF3 Service on Neurosnap

The RF3 Service is now available on Neurosnap for all-atom structure prediction across protein-protein, protein-ligand, and nucleic acid complexes.

If you are working with stereochemically sensitive ligands, mixed chirality peptides, or constraint-guided modeling workflows, you can evaluate the RF3 Service directly on your systems and compare it alongside other models available on the platform.

Explore more posts

AlphaFold3: Advancing Antibody Docking and Improving Accuracy

By Danial Gharaie Amirabadi

How to Use RFpeptides Online for Macrocyclic Peptide Design

By Danial Gharaie Amirabadi

Practical Molecular Docking with DiffDock & Neurosnap.

By Keaun Amani

Comparative Genomics: Analysis of Evolutionary Relationships Among Species

By Amélie Lagacé-O'Connor

Interaction Fingerprints Online: Comparing Hsp90 Binding Modes

By Danial Gharaie Amirabadi