OpenFold3: Open Reproduction of AlphaFold3-Style Biomolecular Co-folding
Written by Danial Gharaie Amirabadi | Published 2026-2-18
Written by Danial Gharaie Amirabadi | Published 2026-2-18
Biomolecular structure prediction has moved beyond the “single protein chain” problem. Many of the highest-impact applications in mechanistic biology and drug discovery depend on modeling heterogeneous assemblies, including protein–protein complexes, protein–nucleic acid interactions, and protein–ligand binding. In these settings, accuracy is driven by interaction geometry and joint context rather than isolated folds.
AlphaFold3-class systems represent a step change in this direction by treating structure prediction as a generative problem. Instead of relying primarily on a deterministic structure refinement module, these models apply a diffusion-based denoising process over coordinates. This is a more natural fit for multi-component assemblies and diverse chemistries.
OpenFold3 is motivated by a straightforward objective: provide an open implementation pathway to this diffusion-based, multi-modal modeling paradigm. In practice, open reproduction matters for three reasons. First, it enables transparent evaluation across tasks and datasets using clearly defined protocols. Second, it allows the community to inspect and improve the modeling stack, including data processing, inference configuration, and calibration. Third, it lowers barriers to integration with downstream scientific workflows, where structure prediction is typically one component in a larger pipeline such as screening, rescoring, design, or mechanistic interpretation.
OpenFold3 is a diffusion-based biomolecular structure model designed to predict structures of heterogeneous complexes within a single framework. Conceptually, it replaces the “predict and refine” style that characterized earlier generations with a process that iteratively denoises a noisy structural hypothesis into a physically plausible configuration.
A central design principle is a unified representation that can encode proteins, nucleic acids, and small molecules in a common modeling space. This includes tokenization that spans residues, nucleotides, and ligand atoms, allowing the model to learn interaction patterns across modalities rather than treating each as a separate problem.
The representation is paired with interaction-aware components, such as pair features and cross-attention over partners, which are necessary for reasoning about interfaces, binding geometry, and joint assembly constraints.
Practically, this framing changes how users should think about prediction. OpenFold3 is not only estimating a final structure. It is sampling from a conditional distribution over plausible complexes given the inputs. As a result, protocol choices such as how many samples are generated, how candidates are selected, and what confidence signals are used become part of the effective modeling system rather than an afterthought.
To interpret OpenFold3 benchmark results correctly, it is important to separate the model architecture from the inference protocol used during evaluation. Because OpenFold3 is diffusion-based, evaluation is framed as a sampling problem rather than a single deterministic forward pass.
In the reported benchmark setup, each target is evaluated using multiple stochastic runs, generating a total of 25 candidate structures per target. This design reflects a realistic usage pattern. Sampling diversity can significantly influence whether the model identifies a high-quality structural solution.
A key methodological distinction in the benchmark reporting is the difference between oracle performance and ranking-based performance. Oracle metrics reflect the best candidate among the sampled structures. They measure the model’s capacity to generate an accurate solution when multiple hypotheses are explored. Ranking-based metrics instead measure the model’s ability to select the best structure from its own candidate set.
In practical terms, oracle performance reflects generative capacity, while ranking performance reflects end-to-end usability. If strong solutions are frequently generated but not selected, downstream workflows may require additional rescoring or manual inspection. If ranking aligns closely with oracle performance, the model behaves more reliably as a single-stage predictor.
The benchmark suite evaluates OpenFold3 relative to AlphaFold3 and other open systems. This comparative framing helps distinguish differences in generative quality from differences in candidate selection behavior, which are especially relevant in diffusion-based structure prediction systems.
OpenFold3 adopts the diffusion-based formulation introduced in AlphaFold3, modeling structure prediction as iterative denoising over atomic coordinates rather than deterministic refinement. The high-level architecture and modeling assumptions follow the published AlphaFold3 design. This section focuses only on implementation details that required clarification or adjustment during reproduction.
OpenFold3 uses a unified molecular representation spanning proteins, nucleic acids, and small molecules within a shared modeling space. Interaction-aware components such as pair representations and cross-attention across partners follow the AlphaFold3 paradigm.
Several practical modifications were required for stable reproduction:
Distance supervision uses 38 equal-width bins from 3.25 Å to 50.75 Å, plus an overflow bin. In addition, certain training-stage step counts were inferred using the model-selection metric described in the original work, since they were not explicitly disclosed.
An implementation issue related to inter-chain masking in the template module was also identified and corrected.
OpenFold3 does not introduce a new architecture relative to AlphaFold3. Its contribution lies in translating that design into an open and reproducible implementation, with documented engineering decisions that affect stability and behavior.
OpenFold3 was evaluated across protein monomers, protein complexes, and protein–ligand systems using the sampling protocol described earlier, and compared against AlphaFold3 and other open systems.

On the CASP16 monomer set, OpenFold3 performs slightly below AlphaFold3 and is broadly comparable to other reproduction efforts. For protein–protein complexes, AlphaFold3 maintains a performance advantage overall, while OpenFold3 remains competitive on general protein interaction benchmarks. The largest gap is observed in antibody–antigen complexes, where AlphaFold3 shows substantially stronger results.
These trends are reflected in standard structural metrics such as lDDT for monomers and DockQ for complexes.
Protein–ligand performance was evaluated on the Runs N’ Poses benchmark, stratified by pocket similarity to the training distribution. AlphaFold3 achieves the strongest overall performance, particularly in more novel settings.
Within open systems, OpenFold3 demonstrates strong oracle performance but weaker ranking-based performance. This indicates that high-quality ligand poses are often present among sampled candidates, but the internal confidence and ranking mechanisms do not always select them optimally.
Overall, the results show that OpenFold3 reproduces much of the generative capacity of AlphaFold3-class systems, with remaining differences concentrated in specific modalities and in candidate selection behavior.
OpenFold3 brings diffusion-based, multi-modal structure prediction into an open framework. It closely follows the AlphaFold3 design while documenting the concrete engineering decisions required for stable and reproducible implementation.
Across evaluated benchmarks, OpenFold3 is competitive with other open systems and approaches AlphaFold3 performance in several settings. In protein–ligand modeling, strong oracle results indicate that high-quality poses are frequently generated, even when ranking does not always select them optimally. This highlights an important practical consideration for diffusion-based systems: sampling strategy and candidate selection materially affect end-to-end performance.
Three practical takeaways:
OpenFold3 is best viewed as a structural hypothesis generator that integrates naturally into broader computational pipelines, including docking refinement, rescoring, and molecular simulation.
Curious about what OpenFold3 can do? Explore our OpenFold3 service and see it in action on your own protein complexes and protein–ligand systems.
By Danial Gharaie Amirabadi
By Amélie Lagacé-O'Connor
By Keaun Amani
By Keaun Amani
By Keaun Amani
By Keaun Amani
Register for free — upgrade anytime.
Interested in getting a license? Contact Sales.
Sign up free