GeoFlow-V2 technical report 리뷰

들어가며

Prediction과 design을 inpainting으로 묶기

Structure constraint: epitope, contact, apo/holo

Sequence co-design과 design recycling

Antibody-antigen structure prediction benchmark

Protein-ligand와 antibody structure prediction

De novo antibody design: proxy 중심의 evidence

Binder discrimination과 virtual screening

Binder design examples: FAD, OKT3, DNA

GeoFlow-V3와의 연결

평가: prediction과 design을 한 interface로 묶으려는 전 단계

참고

GeoFlow-V2 technical report 리뷰

들어가며

GeoFlow-V2는 논문이라기보다 company-led technical report로 보는 편이 맞습니다. BioGeometry가 2025년에 공개한 “GeoFlow-V2: A Unified Atomic Diffusion Model for Protein Structure Prediction and De Novo Design”은 protein structure prediction과 de novo design을 하나의 all-atom diffusion framework로 묶으려는 보고서입니다. 이후 GeoFlow-V3가 VHH low-N wet-lab campaign을 전면에 내세웠다면, GeoFlow-V2는 그 전 단계의 method/infrastructure 성격이 강합니다.

핵심은 prediction과 generation을 같은 framework 안에 넣는 것입니다. Complete protein sequence가 주어지면 structure prediction을 하고, partially masked protein input이 주어지면 masked region의 sequence와 structure를 함께 생성합니다. Antibody, VHH, binder, ligand-binding protein 같은 design 문제를 conditional inpainting으로 다시 쓰는 셈입니다.

이 프레임은 흥미롭습니다. AlphaFold3-like all-atom predictor를 단순 evaluator로 쓰는 것이 아니라, masked protein pseudo sequence와 structure constraints를 넣어 design model처럼 사용하려는 방향이기 때문입니다. 하지만 GeoFlow-V2의 evidence는 대부분 in silico benchmark와 virtual screening proxy입니다. Experimental antibody/VHH 결과는 report 안에서 future update로 남아 있고, 직접적인 low-N wet-lab hit discovery evidence는 GeoFlow-V3 쪽에서 더 강하게 나옵니다.

Prediction과 design을 inpainting으로 묶기

GeoFlow-V2의 기본 formulation은 structure prediction과 protein design을 모두 conditional generation 문제로 보는 것입니다. Full sequence가 있으면 전체 structure를 예측합니다. Masked sequence가 있으면 mask된 residue의 structure와 amino-acid identity를 생성합니다. Protein binder design에서는 target sequence/structure를 조건으로 두고, binder chain의 sequence/structure를 생성합니다.

이 관점에서 GeoFlow-V2는 단순한 structure predictor가 아닙니다. 그렇다고 experimental design platform이라고 바로 말하기도 어렵습니다. 더 정확히는 structure prediction, all-atom generation, conditional protein generation, binder generation을 하나의 framework 안에 넣으려는 infrastructure report입니다.

Masked residue는 `UNK` pseudo-residue로 표현됩니다. 이 pseudo-residue는 C, N, O, Cα backbone atoms만 갖습니다. Task에 따라 CDR residues나 binder chain residues가 mask될 수 있습니다. Protein masking에는 비교적 직접적으로 적용되지만, report 기준으로 small molecule generation이나 nucleic acid generation은 future work에 남아 있습니다.

Structure constraint: epitope, contact, apo/holo

GeoFlow-V2는 design에 필요한 구조 조건을 넣기 위해 target structure distance map을 noise-perturbed binned feature로 인코딩합니다. Report에서 강조하는 constraint는 세 가지입니다.

Epitope constraint는 특정 antigen residue 또는 residue set이 binder chain과 가까워지도록 하는 조건입니다. Contact constraint는 residue-residue pair distance restraint입니다. Apo/holo structure conditioning은 target structure의 coarse-grained distance map을 넣는 방식입니다. Training 중 각 constraint type은 일정 확률로 활성화되고, constraint dropout도 사용됩니다.

이 설계는 GeoFlow-V2를 blind generator가 아니라 steerable design model에 가깝게 만듭니다. 특히 antibody/VHH design에서는 target epitope를 얼마나 잘 반영하는지가 중요합니다. 다만 constraint를 만족하는 predicted complex가 나왔다고 해서 measured binding affinity나 specificity가 확인되는 것은 아닙니다. GeoFlow-V2의 constraint evidence는 먼저 structure/pose proxy로 이해하는 것이 자연스럽습니다.

Sequence co-design과 design recycling

GeoFlow-V2는 masked residue의 amino acid identity를 diffusion step마다 예측하는 sequence co-design module을 둡니다. Diffusion module의 structural representation과 이전 step의 amino-acid prediction을 바탕으로 sequence를 업데이트합니다. 즉 structure만 만들고 외부 ProteinMPNN에 맡기는 구조가 아니라, sequence와 structure를 report 안의 model loop에서 함께 다루려는 방향입니다.

또 하나 중요한 요소는 design recycling입니다. Designed sequence/structure를 다시 GeoFlow-V2에 넣어 refolding하거나 confidence를 추정합니다. 이 과정은 candidate filtering과 binder/non-binder discrimination에 연결됩니다. GeoFlow-V2가 design model이면서 score model처럼도 쓰이는 이유입니다.

하지만 여기에도 해석상 주의할 지점이 있습니다. Recycling과 confidence score는 candidate를 줄이는 데 유용한 proxy일 수 있습니다. 그러나 confidence가 높다고 measured KD가 좋다거나, target-specific function이 나온다는 뜻은 아닙니다. 특히 antibody design에서는 pose confidence, binder/non-binder discrimination, affinity ranking이 서로 다른 문제입니다.

Antibody-antigen structure prediction benchmark

GeoFlow-V2는 antibody-antigen complex prediction benchmark를 제시합니다. 2024년 6월 30일부터 2025년 1월 30일 사이 release된 PDB antibody-antigen complexes를 수집하고, Fv crop 및 length filter를 적용해 104 complexes를 구성합니다. 비교 모델은 Protenix, Chai-1, Boltz, AlphaFold Multimer v2.3, GeoFlow-V1입니다. AlphaFold3는 license restriction 때문에 직접 비교에서 제외됩니다.

GeoFlow-V2는 DockQ > 0.23 기준 Top-1 success 45.19%를 보고하며, 비교 모델보다 높다고 주장합니다. Four epitope constraints를 넣으면 antibody-antigen Top-1 acceptable success rate가 45%에서 75%로 올라간다고도 보고합니다. Holo antigen constraint는 high-DockQ prediction에서 특히 유리하다고 설명합니다.

이 결과는 GeoFlow-V2가 antibody-antigen pose prediction과 constraint-guided docking에서 강한 proxy signal을 낼 수 있음을 보여줍니다. 하지만 DockQ success는 interface pose metric입니다. Binding, affinity, specificity, function, wet-lab validation을 직접 말하지 않습니다. Design utility로 이어지려면 이 pose prediction이 candidate selection과 assay hit로 연결되는지 따로 확인하는 과정이 뒤따라야 해석이 안정적입니다.

Protein-ligand와 antibody structure prediction

GeoFlow-V2는 protein-ligand structure prediction benchmark도 포함합니다. PoseBusters-style benchmark에서 top-ranked prediction 기준 pocket-aligned ligand RMSD < 2 Å를 success로 보고, 77% success rate를 주장합니다. 이 result는 all-atom model이 ligand pose까지 다루려는 방향을 보여줍니다.

다만 ligand pose prediction은 ligand-binding protein design의 필요조건에 가깝습니다. Pose가 맞는다는 것은 binding affinity, selectivity, catalytic activity를 뜻하지 않습니다. 특히 report 자체도 pure all-atom model이 ligand conformation에서 incorrect chirality를 가끔 만들 수 있다고 한계를 적습니다. 이 점은 all-atom generation에서 매우 중요한 caveat입니다.

GeoFlow-V2-ab라는 antibody structure prediction 전용 lightweight variant도 제시됩니다. Input embedding layers를 줄이고, SAbDab pretraining 후 OAS database 약 1.86M paired antibody sequences에서 confidence score > 0.70인 prediction을 distillation dataset으로 사용합니다. Report는 GeoFlow-V2-ab distilled가 ABodyBuilder3, NanoBodyBuilder2, AlphaFold Multimer v2.3와 비교해 outlier를 줄이고, AlphaFold Multimer v2.3보다 150–250× 빠른 inference를 보였다고 주장합니다.

이 부분은 de novo antibody design evidence라기보다 high-throughput antibody structure generation/screening infrastructure로 보는 편이 맞습니다.

De novo antibody design: proxy 중심의 evidence

GeoFlow-V2의 de novo antibody design 평가는 in silico 중심입니다. Benchmark는 7개 published therapeutic antibody targets와 3개 in-house nanobody targets를 포함합니다. Nanobody에는 standardized humanized VHH framework `h-NbBcII10FGLA`, conventional antibody에는 Trastuzumab framework를 사용합니다. 각 target antigen에 대해 five binding-critical hotspot residues를 고르고, GeoFlow-V2와 RFAntibody가 각각 1,000 de novo designed structures를 생성합니다.

평가 metric은 hotspot pass rate, CDR interaction pass rate, overall pass rate, framework recovery rate, diversity입니다. Hotspot pass rate는 5개 hotspot 중 3개 이상이 antigen과 Cα distance < 9 Å인지 봅니다. CDR interaction pass rate는 interface가 CDR-mediated인지 봅니다. Overall pass rate는 hotspot, CDR interface, clash-free 기준을 함께 봅니다.

GeoFlow-V2는 median overall pass rate가 0.5 이상이고 RFAntibody보다 robust하다고 주장합니다. 특히 ACVR2B, FXI, 일부 in-house nanobody target처럼 RFAntibody가 약한 target에서 강하다고 해석합니다. 반면 diversity는 RFAntibody가 더 높습니다. Report는 GeoFlow-V2가 structure prediction task와 통합되어 structural inductive bias가 강하기 때문에 conformational exploration이 줄어든 tradeoff로 해석합니다.

이 결과는 GeoFlow-V2의 design 방향을 보여주지만, wet-lab hit rate는 아닙니다. Hotspot/CDR/clash 기준을 통과하는 predicted design이 실제 target에 binding한다는 뜻은 아닙니다. GeoFlow-V2의 de novo antibody design evidence는 structural plausibility와 interface proxy 중심입니다.

Binder discrimination과 virtual screening

GeoFlow-V2는 binder/non-binder discrimination benchmark도 제시합니다. HER2/Trastuzumab, 5A12 VEGF, 5A12 ANG-2, TSLP, FXI, IL36R, TNFRSF9, C5, ACVR2B, IL17A 등 10개 wet-lab-labeled datasets를 사용합니다. GeoFlow-V2-Score는 designed antibody-antigen complex를 complete sequence + five hotspot constraints로 refolding하고, folding-stage confidence score와 interface PAE를 사용합니다.

Report는 여러 방법이 predictive power를 보이지만, 모든 target에서 하나가 압도하지는 않는다고 설명합니다. GeoFlow-V2가 robust하다는 claim은 있지만, target별 차이가 큽니다. 또 이 benchmark는 reference complex나 inverse-folding/combinatorial mutation에서 출발한 designs를 포함하기 때문에, fully de novo antibody design을 그대로 반영하지는 않습니다.

Virtual screening에서는 GeoFlow-V2와 RFAntibody를 같은 10 target에서 1,000 designs/target으로 비교하고, GeoFlow-V2-Score interface PAE threshold 8.5 Å를 사용해 pass rate를 계산합니다. GeoFlow-V2 median pass rate는 0.179, RFAntibody는 0.077입니다. 하지만 이 비교는 GeoFlow-V2 scoring module을 사용하므로 bias 가능성이 있습니다. Report 자체도 이 점을 인정합니다.

따라서 virtual screening pass rate는 experimental hit rate가 아니라 model-specific score를 통과한 비율입니다. Candidate prioritization proxy로는 의미가 있지만, assay denominator와 분리해서 봐야 합니다.

Binder design examples: FAD, OKT3, DNA

GeoFlow-V2는 antibody 외의 binder design examples도 보여줍니다. FAD 또는 FMN을 condition으로 주고 150 masked residues로 binder를 생성하는 flavin-binding protein 예시가 있습니다. PLIP로 hydrophobic interaction, hydrogen bond, π-stacking, π-cation, salt bridge를 시각화합니다. 하지만 experimental FAD/FMN binding, spectroscopy, activity, structure validation은 없습니다.

OKT3-masking peptide 예시도 있습니다. OKT3 light/heavy chain structure와 15 masked residues peptide를 입력해 masking peptide 후보를 생성합니다. Target conformation을 완전히 rigid하게 고정하지 않고 flexibility를 허용한다고 설명합니다. 그러나 peptide binding, competition, T-cell modulation assay는 없습니다.

DNA-binding protein 예시에서는 NhaR regulatory DNA sequence를 target으로 두고 85 masked residues protein binder를 생성합니다. 일부 examples는 intended groove를 따르지만, alternative groove로 docking하는 case도 보입니다. 이 역시 DNA binding assay, transcriptional repression assay, specificity/function validation은 없습니다.

이 예시들은 GeoFlow-V2의 broad formulation을 보여주는 데 좋습니다. 하지만 evidence layer는 qualitative in silico demonstration입니다. 실제 function까지 말해주지는 않습니다.

GeoFlow-V3와의 연결

GeoFlow-V2를 단독으로 보면 넓고 야심찬 infrastructure report입니다. Structure prediction, protein-ligand pose, antibody structure prediction, antibody/VHH design proxy, binder discrimination, virtual screening, illustrative binder design까지 모두 다룹니다. 대신 wet-lab validation은 report 안에서 아직 중심 evidence가 아닙니다.

GeoFlow-V3는 이 다음 단계로 이해할 수 있습니다. GeoFlow-V3는 antibody-antigen prediction, confidence-based filtering, epitope-conditioned VHH design을 더 직접적인 low-N wet-lab validation으로 밀어붙입니다. 5개 target, 8개 epitope campaign에서 direct synthesis와 BLI hit screen을 보고합니다.

그래서 timeline상 GeoFlow-V2는 unified predictor-generator infrastructure milestone이고, GeoFlow-V3는 low-N VHH hit discovery milestone입니다. 두 report를 섞으면 GeoFlow-V2의 proxy evidence를 wet-lab evidence처럼 과장할 위험이 있습니다. 분리해서 읽는 편이 더 정확합니다.

평가: prediction과 design을 한 interface로 묶으려는 전 단계

GeoFlow-V2의 의미는 broad task formulation에 있습니다. Full sequence가 있으면 structure prediction, masked protein input이 있으면 sequence/structure generation. Epitope, contact, apo/holo structure constraint를 넣고, sequence co-design과 design recycling으로 candidate를 만들고 평가합니다. BioGeometry가 prediction model과 design model을 하나의 product-like interface로 합치려는 방향이 뚜렷하게 보입니다.

강점은 넓은 범위입니다. Antibody-antigen docking, protein-ligand pose, antibody structure prediction, de novo antibody design proxy, binder discrimination, virtual screening, illustrative binder generation을 한 report 안에서 보여줍니다. GeoFlow-V3가 나오기 전 단계의 기술적 기반을 이해하는 데 좋은 자료입니다.

반대로 evidence depth는 제한적입니다. GeoFlow-V2 자체는 experimental antibody/VHH hit discovery를 보여주는 report가 아닙니다. 대부분의 design result는 DockQ, ligand RMSD, hotspot/CDR/clash pass rate, interface PAE, confidence score 같은 computational proxy입니다. Small molecule/nucleic acid generation도 아직 future work입니다. Ligand chirality issue 같은 all-atom model의 물리적 caveat도 남아 있습니다.

따라서 GeoFlow-V2는 “실험적으로 검증된 antibody design platform”이 아니라 “GeoFlow-V3로 이어지는 unified all-atom predictor-generator infrastructure”로 소개하는 것이 가장 자연스럽습니다. 이 기준을 잡아두면, 이후 GeoFlow-V3의 low-N BLI evidence가 왜 별도 milestone인지도 더 선명해집니다.

참고

•

BioGeometry Team, “GeoFlow-V2: A Unified Atomic Diffusion Model for Protein Structure Prediction and De Novo Design”, bioRxiv technical report, 2025.

•

DOI: https://doi.org/10.1101/2025.05.06.652551

•

Web server: https://prot.design

•

주요 비교 축: GeoFlow-V3, RFAntibody, Protenix, Chai-1, Boltz, AlphaFold Multimer v2.3.