OpenFold3: Open Reproduction of AlphaFold3-Style Biomolecular Co-folding

Written by Danial Gharaie Amirabadi | Published 2026-2-18

Biomolecular structure prediction has moved beyond the “single protein chain” problem. Many of the highest-impact applications in mechanistic biology and drug discovery depend on modeling heterogeneous assemblies, including protein–protein complexes, protein–nucleic acid interactions, and protein–ligand binding. In these settings, accuracy is driven by interaction geometry and joint context rather than isolated folds.

AlphaFold3-class systems represent a step change in this direction by treating structure prediction as a generative problem. Instead of relying primarily on a deterministic structure refinement module, these models apply a diffusion-based denoising process over coordinates. This is a more natural fit for multi-component assemblies and diverse chemistries.

OpenFold3 is motivated by a straightforward objective: provide an open implementation pathway to this diffusion-based, multi-modal modeling paradigm. In practice, open reproduction matters for three reasons. First, it enables transparent evaluation across tasks and datasets using clearly defined protocols. Second, it allows the community to inspect and improve the modeling stack, including data processing, inference configuration, and calibration. Third, it lowers barriers to integration with downstream scientific workflows, where structure prediction is typically one component in a larger pipeline such as screening, rescoring, design, or mechanistic interpretation.

What OpenFold3 is

OpenFold3 is a diffusion-based biomolecular structure model designed to predict structures of heterogeneous complexes within a single framework. Conceptually, it replaces the “predict and refine” style that characterized earlier generations with a process that iteratively denoises a noisy structural hypothesis into a physically plausible configuration.

A central design principle is a unified representation that can encode proteins, nucleic acids, and small molecules in a common modeling space. This includes tokenization that spans residues, nucleotides, and ligand atoms, allowing the model to learn interaction patterns across modalities rather than treating each as a separate problem.

The representation is paired with interaction-aware components, such as pair features and cross-attention over partners, which are necessary for reasoning about interfaces, binding geometry, and joint assembly constraints.

Practically, this framing changes how users should think about prediction. OpenFold3 is not only estimating a final structure. It is sampling from a conditional distribution over plausible complexes given the inputs. As a result, protocol choices such as how many samples are generated, how candidates are selected, and what confidence signals are used become part of the effective modeling system rather than an afterthought.

Evaluation methodology

To interpret OpenFold3 benchmark results correctly, it is important to separate the model architecture from the inference protocol used during evaluation. Because OpenFold3 is diffusion-based, evaluation is framed as a sampling problem rather than a single deterministic forward pass.

In the reported benchmark setup, each target is evaluated using multiple stochastic runs, generating a total of 25 candidate structures per target. This design reflects a realistic usage pattern. Sampling diversity can significantly influence whether the model identifies a high-quality structural solution.

A key methodological distinction in the benchmark reporting is the difference between oracle performance and ranking-based performance. Oracle metrics reflect the best candidate among the sampled structures. They measure the model’s capacity to generate an accurate solution when multiple hypotheses are explored. Ranking-based metrics instead measure the model’s ability to select the best structure from its own candidate set.

In practical terms, oracle performance reflects generative capacity, while ranking performance reflects end-to-end usability. If strong solutions are frequently generated but not selected, downstream workflows may require additional rescoring or manual inspection. If ranking aligns closely with oracle performance, the model behaves more reliably as a single-stage predictor.

The benchmark suite evaluates OpenFold3 relative to AlphaFold3 and other open systems. This comparative framing helps distinguish differences in generative quality from differences in candidate selection behavior, which are especially relevant in diffusion-based structure prediction systems.

Technical approach and reproduction details

OpenFold3 adopts the diffusion-based formulation introduced in AlphaFold3, modeling structure prediction as iterative denoising over atomic coordinates rather than deterministic refinement. The high-level architecture and modeling assumptions follow the published AlphaFold3 design. This section focuses only on implementation details that required clarification or adjustment during reproduction.

OpenFold3 uses a unified molecular representation spanning proteins, nucleic acids, and small molecules within a shared modeling space. Interaction-aware components such as pair representations and cross-attention across partners follow the AlphaFold3 paradigm.

Several practical modifications were required for stable reproduction:

Distance supervision uses 38 equal-width bins from 3.25 Å to 50.75 Å, plus an overflow bin. In addition, certain training-stage step counts were inferred using the model-selection metric described in the original work, since they were not explicitly disclosed.

An implementation issue related to inter-chain masking in the template module was also identified and corrected.

OpenFold3 does not introduce a new architecture relative to AlphaFold3. Its contribution lies in translating that design into an open and reproducible implementation, with documented engineering decisions that affect stability and behavior.

Results across modalities

OpenFold3 was evaluated across protein monomers, protein complexes, and protein–ligand systems using the sampling protocol described earlier, and compared against AlphaFold3 and other open systems.

OpenFold3 benchmark Results

Benchmark comparison of OpenFold3 against AlphaFold3 and other open models across protein/RNA monomers, protein multimers, and protein–ligand complexes, reporting both oracle and ranked performance. Taken from the OpenFold3-preview technical white paper.

Protein monomers and complexes

On the CASP16 monomer set, OpenFold3 performs slightly below AlphaFold3 and is broadly comparable to other reproduction efforts. For protein–protein complexes, AlphaFold3 maintains a performance advantage overall, while OpenFold3 remains competitive on general protein interaction benchmarks. The largest gap is observed in antibody–antigen complexes, where AlphaFold3 shows substantially stronger results.

These trends are reflected in standard structural metrics such as lDDT for monomers and DockQ for complexes.

Protein–ligand systems

Protein–ligand performance was evaluated on the Runs N’ Poses benchmark, stratified by pocket similarity to the training distribution. AlphaFold3 achieves the strongest overall performance, particularly in more novel settings.

Within open systems, OpenFold3 demonstrates strong oracle performance but weaker ranking-based performance. This indicates that high-quality ligand poses are often present among sampled candidates, but the internal confidence and ranking mechanisms do not always select them optimally.

Overall, the results show that OpenFold3 reproduces much of the generative capacity of AlphaFold3-class systems, with remaining differences concentrated in specific modalities and in candidate selection behavior.

Summary and practical takeaways

OpenFold3 brings diffusion-based, multi-modal structure prediction into an open framework. It closely follows the AlphaFold3 design while documenting the concrete engineering decisions required for stable and reproducible implementation.

Across evaluated benchmarks, OpenFold3 is competitive with other open systems and approaches AlphaFold3 performance in several settings. In protein–ligand modeling, strong oracle results indicate that high-quality poses are frequently generated, even when ranking does not always select them optimally. This highlights an important practical consideration for diffusion-based systems: sampling strategy and candidate selection materially affect end-to-end performance.

Three practical takeaways:

  1. OpenFold3 provides a unified architecture for proteins, complexes, and ligand-bound systems.
  2. Generative capacity and ranking behavior should be evaluated separately when interpreting results.
  3. Multi-sample inference and structured candidate selection can improve practical usability in complex modeling workflows.

OpenFold3 is best viewed as a structural hypothesis generator that integrates naturally into broader computational pipelines, including docking refinement, rescoring, and molecular simulation.

Ready to try OpenFold3?

Curious about what OpenFold3 can do? Explore our OpenFold3 service and see it in action on your own protein complexes and protein–ligand systems.

Explore more posts

MMseqs2: Deep understanding of MSA Generation & Optimization

By Danial Gharaie Amirabadi

Protein-Protein Docking Simplified: Illuminating the Mechanics of Protein Interactions

By Amélie Lagacé-O'Connor

Applications of Bioinformatics in Drug Discovery

By Keaun Amani

Protein Symmetries Simplified. Understanding Cn, Dn, and Cubic Symmetries.

By Keaun Amani

Unlock the Power of PyMOL: Elevate Your Protein Visualizations with Expert Tips and Tricks

By Keaun Amani

Synthetic Accessibility: Definition, Importance, and How to Assess It with Neurosnap

By Keaun Amani

Accelerate your lab's
research today

Register for free — upgrade anytime.

Interested in getting a license? Contact Sales.

Sign up free