Genie3 논문 리뷰

들어가며

Genie2에서 Genie3로 넘어가는 지점

Branched polymer와 partial atomization

Training: monomer와 multimer를 함께 쓰기

Hotspot conditioning과 inference-time scaling

Unconditional generation과 motif scaffolding

Binder benchmark: AF2M+ oracle의 범위

Genie3와 BindCraft의 complementarity

Nipah Glycoprotein G wet-lab result

Figure별로 보기

RFdiffusion3, BoltzGen, Latent-X와의 위치

읽을 때의 균형점

평가: SE(3)-equivariant diffusion의 binder design 진입

참고

Genie3 논문 리뷰

들어가며

최근 binder design field는 두 방향으로 나뉘어 움직입니다. 한쪽에는 BindCraft처럼 AlphaFold-style oracle을 직접 최적화하는 hallucination/optimization 계열이 있고, 다른 한쪽에는 RFdiffusion, BoltzGen, Latent-X처럼 직접 target-conditioned binder를 sample하는 generative model 계열이 있습니다. Genie3는 후자에 속하지만, 조금 다른 질문을 던집니다.

Genie3의 질문은 “SE(3)-equivariance를 유지하면서도 sidechain-level atomistic reasoning과 binder design을 할 수 있는가?”입니다. RFdiffusion3나 AlphaFold3-style all-atom diffusion은 non-equivariant transformer와 absolute coordinate reasoning 쪽으로 기우는 흐름이 강합니다. 반면, Genie3는 Genie2의 frame-based SE(3)-equivariant diffusion lineage를 유지하면서도 partial atomization, multimer training, hotspot conditioning을 붙일 수 있음을 보여줍니다.

이 논문은 다루는 범위가 넓습니다. Unconditional generation, motif scaffolding, binder in silico benchmark, inference-time scaling, 그리고 Nipah Glycoprotein G wet-lab binder까지 다룹니다. 하지만 evidence layer를 분리해서 읽어야 합니다. 대부분의 benchmark는 ProteinMPNN, ESMFold/AF2M, AF2M+ Benchmark, TM-score clustering 같은 in silico proxy이고, wet-lab validation은 8개 제출 design 중 1개 SPR binder, KD 약 92 nM입니다.

Genie2에서 Genie3로 넘어가는 지점

Genie2는 backbone-level motif scaffolding과 diverse structure generation에서 중요한 위치를 갖는 모델입니다. 하지만 Cα/backbone 중심 representation은 binder-target interface를 직접 다루기에는 부족합니다. Binder design에서는 sidechain packing, hotspot residue engagement, interface geometry, multimeric context가 중요합니다.

Genie3는 이 부족한 부분을 partial atomization으로 보완합니다. Protein을 branched polymer로 보고, sequence나 motif/interface가 알려진 segment는 sidechain heavy atoms까지 atomized합니다. 반대로 scaffold처럼 unknown segment는 backbone 중심으로 생성할 수 있습니다. 즉 모든 residue를 full atom으로 모두 처리하기보다, 필요한 곳에 atomistic detail을 넣는 절충입니다.

이 절충은 Genie3의 정체성입니다. All-atom generation을 하되 SE(3)-equivariance를 버리지 않고, sidechain atom을 frame cloud 안으로 끌어옵니다. 이 점에서 RFAA/RFdiffusionAA의 atomized context, BoltzGen의 all-atom diffusion, Latent-X의 closed all-atom complex generation과 비교할 수 있는 좋은 위치에 있습니다.

Branched polymer와 partial atomization

Genie3의 forward process는 partially atomized protein을 atom point cloud로 보고 Gaussian diffusion을 적용합니다. Reverse process에서는 frame cloud로 reasoning합니다. Backbone Cα trace에는 Frenet-Serret frames를 만들고, sidechain heavy atom trace에도 atom14 ordering을 따라 frames를 확장합니다.

Architecture도 Genie2에서 확장됩니다. Input embedder는 noisy structure와 timestep을 single/pair representation으로 만들고, latent transformer가 single-pair 정보를 매 layer에서 교환합니다. Pair representation은 outer product로 update되고, single representation은 pair-biased attention과 value injection으로 update됩니다. Global tokens도 추가해 전체 구조적 context를 잡습니다. Structural decoder는 IPA를 사용해 input frames를 갱신하고 final noise vectors를 예측합니다.

이 설명에서 중요한 점은 “atomistic”이라는 단어를 과장하지 않는 것입니다. Genie3가 모든 task에서 full all-atom binding pose를 실험적으로 맞췄다는 뜻은 아닙니다. Methodologically sidechain-aware partial atomization을 넣었다는 뜻이고, 그 효과는 benchmark와 일부 wet-lab demonstration 안에서 평가됩니다.

Training: monomer와 multimer를 함께 쓰기

Genie3는 monomer와 multimer를 모두 학습합니다. Monomer는 Genie2처럼 Foldseek-clustered AFDB에서 pLDDT ≥ 80 조건으로 가져옵니다. 논문은 experimentally determined monomeric structures는 쓰지 않는다고 적습니다. Long monomer generalization을 볼 때 이 detail이 중요합니다.

Multimer는 Pinder dimer dataset을 사용합니다. Interface cluster를 먼저 sampling한 뒤 complex를 고르고, 한 chain을 target, 다른 chain을 binder로 봅니다. Interface residues 일부를 mask해 reconstruct하게 하며, computational feasibility 때문에 interface residues만 atomize합니다.

Binder generation에서는 training-time interface definition과 user-provided hotspot residues 사이에 distribution shift가 생깁니다. Training에서는 interface가 complex structure에서 정의되지만, 실제 사용자는 hotspot residues를 넣습니다. Genie3는 이 mismatch를 줄이기 위해 extended hotspots와 convergent hotspots를 사용합니다.

Hotspot conditioning과 inference-time scaling

Extended hotspots는 user-provided hotspot 주변 surface residues를 interface로 확장하는 heuristic입니다. Convergent hotspots는 이전 design rounds에서 성공한 designs가 공통으로 쓰는 interface residues를 다음 round condition으로 다시 넣는 방식입니다. 즉 단순히 더 많이 sample하는 것이 아니라, 이전 round에서 얻은 interface information을 조건으로 되먹임합니다.

논문은 TNFα 같은 어려운 target에서 convergent hotspot heuristic을 여러 design rounds에 반복 적용하면 success rate가 거의 monotone하게 올라간다고 설명합니다. 이 결과는 inference-time scaling의 한 형태입니다. Compute를 더 쓰되, 무작정 sampling 수만 늘리는 것이 아니라 conditioning distribution을 바꿉니다.

하지만 이 result는 AF2M+ Benchmark 기반입니다. 실제 binding, specificity, developability를 보여주는 것은 아닙니다. Hotspot conditioning이 oracle 안에서 성공률을 높인다는 evidence와 wet-lab binder hit rate는 구분해서 읽어야 합니다.

Unconditional generation과 motif scaffolding

Unconditional generation은 ProteinMPNN inverse folding, ESMFold prediction, self-consistency, TM-score clustering으로 평가합니다. Short monomer는 50–250 residues, long monomer는 300–800 residues를 대상으로 length별 100 structures를 생성합니다. Genie3는 short monomer에서 Ambient와 La-Proteina 같은 model과 comparable or better performance를 보이고, long monomer에서는 더 높은 performance를 보인다고 주장합니다.

Long monomer result는 흥미롭습니다. Genie3는 256 residues 이하로 학습됐기 때문에 300–800 residue generation은 out-of-domain generalization입니다. 다만 이 역시 structure prediction/self-consistency proxy입니다. 실제 expression, folding, function은 별도 evidence입니다.

Motif scaffolding은 30 challenges로 구성된 MotifBench에서 평가합니다. Genie3는 Protpardelle-1c만큼 많은 problems를 solve하면서 더 높은 MotifBench score를 얻었다고 보고합니다. Sidechain-aware motif atomization이 Cα-only constraint보다 motif fidelity에 유리할 수 있다는 신호입니다. 하지만 motif benchmark score는 wet-lab function evidence가 아닙니다.

Binder benchmark: AF2M+ oracle의 범위

Binder design benchmark는 AlphaProteo-style in silico pipeline을 바탕으로 합니다. 원래 AF3 Benchmark는 AlphaFold3 prediction과 designed complex agreement를 쓰지만, license/availability 문제 때문에 main comparison은 AF2M Benchmark로 바꿉니다. AF2M prediction은 5개 AF2M models와 up to 20 recycles를 사용하고, rank score는 0.8·ipTM + 0.2·pTM입니다.

Genie3는 여기에 hotspot/interface compliance constraint를 추가해 AF2M+ Benchmark를 만듭니다. User-specified interface residues의 80% 이상이 binder와 5 Å 이내에 있어야 successful design으로 부릅니다. 이 조건은 hotspot prompting이 실제 interface에 반영되는지 보려는 장치입니다.

논문 자체가 이 oracle의 precision이 낮다고 인정하는 점이 중요합니다. Cao et al. dataset retrospective analysis에서 in silico evaluation pipelines의 precision은 최대 약 12%이고, H3, TGFβ, TIE2에서는 true binder를 완전히 못 찾는 경우도 있었다고 설명합니다. 따라서 AF2M+ ranking은 useful proxy지만 experimental hit rate로 이해하면 안 됩니다.

Fixed sampling budget에서는 10개 AlphaProteo-derived binder design problems에 대해 model/problem combination마다 200 structures를 만들고, generated structure당 ProteinMPNN sequences 8개를 inverse fold합니다. Genie3는 10개 중 7개 problem에서 가장 많은 successful designs를 만들었고, 나머지 3개는 BindCraft가 가장 좋았다고 보고합니다. 이 result는 Genie3가 hallucination-based method와 경쟁할 수 있다는 in silico signal입니다.

Genie3와 BindCraft의 complementarity

논문은 Genie3와 BindCraft의 successful designs를 TM-score 0.6 기준으로 clustering해 overlap이 작다고 설명합니다. 이것은 generation-based approach와 hallucination-based approach가 서로 다른 structural solution space를 탐색한다는 해석으로 이어집니다.

이 부분은 중요합니다. Genie3가 BindCraft를 완전히 대체한다는 이야기가 아니라, 다른 solution family를 낼 수 있다는 이야기입니다. Practical binder design에서는 서로 다른 method가 non-overlapping candidate pool을 제공하는 것이 오히려 유리할 수 있습니다.

다만 여기서도 성공 정의는 AF2M+ oracle입니다. Structural solution space가 다르다는 것과 wet-lab hit space가 다르다는 것은 아직 같은 말이 아닙니다. 실제 실험에서 두 method가 상호보완적인지는 더 넓은 wet-lab comparison이 필요합니다.

Nipah Glycoprotein G wet-lab result

Genie3의 wet-lab anchor는 AdaptyvBio Nipah binder design challenge입니다. Target은 Nipah virus Glycoprotein G, 즉 NiV-G입니다. NiV-G는 human Ephrin-B2/B3 receptor binding을 통해 host-cell attachment를 시작하는 viral attachment protein입니다.

Design에서는 공개된 NiV-G/Ephrin-B2 crystal structure PDB 2VSM을 사용합니다. Ephrin-B2에서 8 Å 이내 target residues를 interface region으로 잡고, Genie3로 200 designs를 생성합니다. AF2M+ Benchmark로 34 successful designs, 33 unique structural clusters를 얻었고, challenge selection 기준인 Boltz-2 ipSAE를 계산해 top 8 designs를 제출합니다.

제출한 8 designs는 모두 expressed 되었고, 그중 1개가 SPR에서 measurable binding을 보여 KD 약 92 nM로 보고됩니다. 즉 wet-lab hit rate는 submitted designs 기준 1/8, 12.5%입니다.

이 result는 의미가 있습니다. Genie lineage가 binder design에서 실제 binding evidence를 얻은 지점이기 때문입니다. 하지만 scope는 좁습니다. Selectivity, viral neutralization, receptor competition, developability, structural pose validation, in vivo property는 이 paper에서 확립되지 않았습니다. 또한 selection funnel은 AF2M+와 Boltz-2 ipSAE를 포함하므로, raw model-only hit rate가 아니라 pipeline-level evidence입니다.

논문은 challenge submissions에서 RFDiffusion, BindCraft, BoltzGen aggregate hit rates가 각각 3/60, 1/100, 2/288이었다고 언급합니다. 이 비교는 조심해서 보면 충분합니다. Teams, procedures, expertise, selection criteria가 다르기 때문에 controlled model comparison으로 이해하기 어렵습니다. 논문도 이 caveat를 직접 둡니다.

Figure별로 보기

Figure 1은 Genie3의 branched-polymer / partial atomization architecture를 보여줍니다. 이 figure에서는 sidechain-aware atomization을 SE(3)-equivariant frame reasoning 안에 넣는다는 점을 잡으면 됩니다.

Figure 2는 unconditional generation과 MotifBench result입니다. 이 figure는 Genie3가 backbone/motif generation lineage를 유지하면서 Genie2보다 넓어진 성능을 주장하는 부분입니다.

Figure 3은 binder benchmark와 Nipah SPR result가 함께 있는 중심 figure입니다. AF2M/AF3 filtering assessment, Genie3 vs BindCraft complementarity, Nipah binder SPR curve가 들어갑니다. 여기서 benchmark proxy와 wet-lab SPR evidence를 분리해서 읽어야 합니다.

Figure 4는 inference-time scaling과 hotspot conditioning effect입니다. Extended/convergent hotspot heuristic이 oracle success rate를 끌어올리는지 보여줍니다. Figure 5는 Genie2 vs Genie3 ablation으로 long unconditional generation에서 architecture change가 어떤 영향을 주는지 보여줍니다.

Table 13은 AF2M benchmark retrospective precision/recall을 보여주는 중요한 해석 기준입니다. Table 14는 binder design benchmark targets와 hotspot residues입니다. Appendix E.5 / Figure 17은 fixed ColabFold budget 아래 structural diversity와 sequence diversity를 다룹니다.

RFdiffusion3, BoltzGen, Latent-X와의 위치

Genie3는 all-atom/atomistic binder design 흐름 안에서 독특한 위치를 갖습니다. RFdiffusion3나 BoltzGen은 broader all-atom generation/platform 쪽이고, Latent-X는 closed all-atom binder platform으로 강한 wet-lab hit-rate를 제시합니다. Genie3는 이들보다 wet-lab breadth는 좁지만, SE(3)-equivariant generative model이 sidechain-aware binder design으로 넘어갈 수 있다는 방법론적 메시지가 강합니다.

BoltzDesign1과도 대비됩니다. BoltzDesign1은 predictor inversion/hallucination입니다. Boltz-1 Pairformer와 Confidence module을 거꾸로 미분해 design candidates를 만듭니다. Genie3는 standalone generative diffusion model입니다. 둘 다 open-side all-atom/atomistic design 흐름에 있지만, 하나는 predictor를 역이용하고 다른 하나는 generative distribution을 학습합니다.

BindCraft와는 practical complementarity가 핵심입니다. BindCraft는 AF2-style optimization으로 강한 practical hit를 보여준 method이고, Genie3는 generative model이 다른 structural solution space를 낼 수 있다고 주장합니다. 이 차이는 앞으로 wet-lab candidate pooling에서 중요해질 수 있습니다.

읽을 때의 균형점

Genie3는 Genie lineage가 binder design으로 들어온 첫 의미 있는 step으로 볼 만합니다. ProteinMPNN, ESMFold, AF2M, AF2M+, TM-score clustering, hotspot-compliance filter는 넓은 benchmark를 구성하고, Nipah Glycoprotein G challenge에서는 8개 제출 design 중 1개가 SPR에서 KD 약 92 nM binding을 보였습니다.

다만 benchmark와 wet-lab anchor는 같은 층위가 아닙니다. AF2M+ oracle의 retrospective precision은 논문 자체에서도 낮다고 설명되며, hotspot-compliance success가 곧 experimental hit rate를 뜻하지는 않습니다. Nipah result는 실제 SPR binding evidence지만 specificity, neutralization, solved structure, developability까지 확장된 검증은 아닙니다.

Inference-time scaling도 같은 방식으로 보면 됩니다. Extended/convergent hotspot으로 oracle success가 올라간다는 것은 useful design heuristic입니다. 실제 실험 hit rate가 함께 올라가는지는 별도 campaign에서 확인될 질문입니다. Dedicated Genie3 code/weights release가 아직 제한적이라는 점도 reproducibility 관점에서 남겨둘 지점입니다.

평가: SE(3)-equivariant diffusion의 binder design 진입

Genie3의 가치는 “binder design을 해결했다”가 아니라, Genie lineage가 binder design으로 들어온 첫 의미 있는 step이라는 데 있습니다. Sidechain-aware partial atomization, multimer training, hotspot conditioning, AF2M+ filtering, Nipah SPR hit가 하나의 흐름으로 이어집니다.

특히 SE(3)-equivariance를 유지하면서 atomistic reasoning을 하려는 선택은 흥미롭습니다. All-atom design이 반드시 non-equivariant absolute-coordinate transformer로만 가야 하는 것은 아니라는 반례를 제공합니다. 이 점은 RFdiffusion3, BoltzGen, Latent-X와 비교할 때 Genie3를 따로 이해해야 하는 이유입니다.

Genie3는 therapeutic binder platform으로 단정하기보다는, SE(3)-equivariant generative diffusion이 hallucination-based binder design과 경쟁할 수 있음을 보여준 method milestone으로 보는 것이 자연스럽습니다. Benchmark breadth와 Nipah 1/8 SPR hit를 함께 놓으면, 가능성과 현재 evidence boundary가 둘 다 분명해집니다.

참고

- Paper: “Fast and Ultra-Capable Protein Design: Advancing the Frontier Through Atomistic SE(3)-Equivariance with Genie 3” - Authors: Yeqing Lin, Minji Lee, Aakarsh Vermani, Ellena Jiang, Sebastiaan De Cooman, Matej Špetko, Mohammed AlQuraishi - bioRxiv DOI: https://doi.org/10.64898/2026.05.01.722168 - Raw source: `raw/papers/Genie3/genie3.pdf` - Extracted source: `raw/papers/Genie3/extracted/genie3.txt`