AlphaProteo technical report 리뷰

들어가며

Binder design에서 generation과 validation 분리

Target set과 benchmark 범위

Figure 1과 Table 1: 한 번의 medium-throughput screen에서 나온 숫자

RFdiffusion 비교와 assay context

AF2/AF3 filtering과 validation 구분

Figure 2–3: intended epitope와 specificity의 범위

Figure 4: functional validation 사례

Figure 5: designed pose와 experimental structure

Developability evidence의 범위

Closed system benchmark로서의 한계

평가: wet-lab hit-rate benchmark로서의 AlphaProteo

참고

AlphaProteo technical report 리뷰

들어가며

Protein binder design을 이야기할 때 가장 조심해야 하는 문장은 “모델이 binder를 만들었다”입니다. 구조 그림이 예쁘게 나오고, AlphaFold류 모델이 높은 confidence를 주고, interface가 그럴듯해 보여도 실제 target binding으로 이어지는지는 별개의 문제입니다. 특히 de novo binder design에서는 후보 생성, sequence realization, filtering, expression, binding assay, functional assay가 모두 다른 병목을 만듭니다.

AlphaProteo는 이 지점에서 Google DeepMind가 발표한 company-led technical report로 읽는 편이 적절합니다. Google DeepMind가 2024년에 공개한 technical report “De novo design of high-affinity protein binders with AlphaProteo”는 target structure와 optional hotspot residues를 입력으로 받아 binder 후보를 만들고, filter를 거쳐 실험 후보를 고른 뒤, 실제 wet-lab에서 binding과 function, 일부 구조 검증까지 보여줍니다. technical report가 보고한 headline은 강합니다. 8개 target을 실험했고, 그중 7개 target에서 binding hit를 얻었습니다. 성공한 target들의 experimental success rate는 9–88%였고, best binder affinity는 여러 target에서 low-nanomolar 또는 sub-nanomolar 범위였습니다.

하지만 이 technical report를 “AlphaProteo가 binder design을 해결했다”는 식으로 읽으면 곤란합니다. 공개된 자료는 method paper라기보다 performance and validation report에 가깝습니다. 저자들은 biosecurity와 commercial considerations를 이유로 machine learning methods, training details, model weights, inference details를 공개하지 않는다고 명시합니다. 그래서 AlphaProteo는 RFdiffusion이나 ProteinMPNN처럼 우리가 pipeline 안에 넣어 재현할 수 있는 공개 방법론이라기보다, closed high-performance system이 medium-throughput validation에서 어느 정도 성능을 냈는지 보여주는 기준점으로 읽는 편이 안전합니다.

이 글에서는 AlphaProteo를 “비공개라서 의미 없다”거나 “성능이 좋으니 완결됐다”는 양극단으로 보지 않겠습니다. 특히 hit rate를 볼 때는 denominator가 post-filter wet-lab candidates인지, raw generated samples인지, purified follow-up subset인지 계속 분리하겠습니다. 대신 binder design pipeline 관점에서 이 technical report이 실제로 무엇을 검증했고, 어떤 숫자는 어떤 denominator에서 나온 것이며, 어디서부터는 아직 조심해서 읽어야 하는지 차근차근 보겠습니다.

Binder design에서 generation과 validation 분리

AlphaProteo의 전체 구조는 Figure 1에서 비교적 단순하게 제시됩니다. 시스템은 크게 generator와 filter로 나뉩니다. Generator는 target protein structure와 선택적으로 지정한 hotspot residues를 받아 binder의 structure와 sequence 후보를 생성합니다. Filter는 그 후보 중 실험으로 가져갈 만한 design을 고릅니다. 이후 yeast surface display, affinity measurement, functional assay, structural validation으로 이어집니다.

이 설명은 짧지만, AlphaProteo를 읽는 데 볼 지점입니다. technical report이 보고하는 hit rate는 raw generation rate가 아닙니다. 생성된 모든 후보 중 몇 퍼센트가 binder였다는 뜻이 아니라, AlphaProteo 내부 filter를 거쳐 실제로 실험에 올린 후보들 중 binding signal을 보인 비율입니다. 그러므로 이 숫자는 generator 단독 성능이 아니라 generator, filtering, candidate selection, assay setup이 합쳐진 pipeline-level 성과입니다.

Training source도 제한적으로만 공개됩니다. technical report는 PDB structure/sequence data와 AlphaFold predictions distillation set을 사용했다고 설명하지만, architecture나 training recipe는 공개하지 않습니다. 공개 모델을 직접 비교하거나 재현 pipeline을 만들기는 어렵습니다. 대신 우리는 결과가 어떤 validation stack 위에 쌓였는지를 볼 수 있습니다.

Target set과 benchmark 범위

technical report는 8개 target을 실험 대상으로 삼았습니다. Viral protein으로는 EBV oncogenic protein인 BHRF1과 SARS-CoV-2 receptor-binding domain이 들어갑니다. Human therapeutic target으로는 IL-7RA, PD-L1, TrkA, IL-17A, VEGF-A, TNF-alpha가 포함됩니다.

이 target set은 잘 설계되어 있습니다. BHRF1처럼 hydrophobic groove가 있어 상대적으로 쉬운 target도 있고, PD-L1이나 TrkA처럼 기존 design에서 어려웠던 target도 있습니다. VEGF-A는 technical report 기준으로 당시 published computationally designed binder가 없었다고 소개됩니다. TNF-alpha는 flat하고 polar한 homotrimer interface를 겨냥한 매우 어려운 case로 들어갑니다.

다만 target selection을 완전한 prospective blind benchmark처럼 읽으면 안 됩니다. 저자들은 SC2RBD, PD-L1, TrkA가 AlphaProteo development에 사용되었기 때문에 이 target들의 success rate가 novel target performance를 과대평가할 수 있다고 직접 적습니다. 반대로 BHRF1, IL-7RA, VEGF-A, IL-17A는 single round medium-throughput testing에서 얻은 prospective 성격이 더 강합니다. 이 차이를 분리해서 보는 것이 좋습니다.

Figure 1과 Table 1: 한 번의 medium-throughput screen에서 나온 숫자

AlphaProteo의 주요 결과는 Table 1에 모여 있습니다. 각 target에서 47–172개 design을 yeast surface display로 테스트했고, binding signal threshold를 넘은 후보를 experimental success로 세었습니다. 일반적으로 binding signal threshold는 0.2였고, IL-17A는 background가 높아 1.3 threshold를 사용했습니다.

Target별 결과는 다음과 같습니다. BHRF1에서는 94개 중 88%가 hit였고 best KD는 8.5 nM입니다. SARS-CoV-2 RBD에서는 172개 중 12%, best KD 26 nM입니다. IL-7RA에서는 94개 중 25%, best KD 0.082 nM입니다. PD-L1에서는 159개 중 15%, best KD 0.18 nM입니다. TrkA에서는 131개 중 9%, best KD 0.96 nM입니다. IL-17A에서는 63개 중 14%, best KD 8.4 nM입니다. VEGF-A에서는 94개 중 33%, best KD 0.48 nM입니다. TNF-alpha에서는 54개를 테스트했지만 hit가 없었습니다.

여기서 특히 눈에 띄는 것은 denominator입니다. 10,000개나 100,000개 library를 대량으로 screening한 결과가 아니라, 대략 한두 장의 96-well plate 규모 후보에서 multiple hits가 나온 것입니다. Binder design을 실제 연구 도구로 쓰려면 바로 이 지점이 볼 지점입니다. “후보를 얼마나 많이 만들 수 있느냐”보다 “실험 가능한 수십~수백 개 후보 안에 쓸 만한 binder가 들어오느냐”가 practical bottleneck이기 때문입니다.

Affinity도 강합니다. IL-7RA, PD-L1, TrkA, VEGF-A에서는 best binder가 sub-nanomolar range에 들어갑니다. technical report는 이를 non-optimized computational designs라고 강조합니다. 즉 directed evolution이나 affinity maturation을 거친 최종 lead가 아니라, computational design 후 제한적인 실험 확인만 거친 binder라는 뜻입니다.

RFdiffusion 비교와 assay context

AlphaProteo technical report는 RFdiffusion과도 직접 비교합니다. 비교 target은 IL-7RA, PD-L1, TrkA입니다. AlphaProteo는 IL-7RA에서 25% hit rate와 0.082 nM best KD를 보고했고, 저자들이 yeast display로 다시 측정한 RFdiffusion design은 17% hit rate와 14 nM best KD를 보였습니다. PD-L1에서는 AlphaProteo가 15%와 0.18 nM, RFdiffusion이 13%와 1.6 nM입니다. TrkA에서는 AlphaProteo가 9%와 0.96 nM, RFdiffusion은 저자들의 assay에서 0% hit로 나옵니다.

이 결과는 AlphaProteo의 성능이 상당히 높다는 근거입니다. 특히 affinity 차이는 작지 않습니다. 하지만 cross-method comparison에서는 assay format을 같이 봐야 합니다. RFdiffusion의 published success rate는 96-well BLI 기반 결과였고, AlphaProteo report에서는 일부 RFdiffusion design을 yeast display assay와 낮은 target concentration 조건에서 다시 측정했습니다. 저자들도 IL-7RA와 TrkA에서 yeast-display RFdiffusion success rate가 published BLI rate보다 낮게 나올 수 있다고 설명합니다.

따라서 여기서 읽을 수 있는 것은 “AlphaProteo가 저자들의 assay setup 안에서 RFdiffusion보다 강한 experimental performance를 보였다”는 점입니다. 반대로 “모든 조건에서 RFdiffusion보다 우월하다”거나 “RFdiffusion 결과가 무효다”까지 가면 과합니다. Binder design에서 success rate는 assay, target concentration, construct, candidate selection에 민감합니다.

AF2/AF3 filtering과 validation 구분

AlphaProteo report는 generator만 강조하지 않습니다. Supplementary Methods에서는 AF2/RFdiffusion-style benchmark와 AF3-based benchmark를 함께 설명합니다. AF2-style benchmark에서는 AF2 initial guess 기반 interchain pAE, binder RMSD, pLDDT 같은 기준을 사용합니다. AF3-based benchmark에서는 이전에 characterization된 640,000개 de novo binder design을 바탕으로 filter를 retrospective하게 최적화하고, min interchain PAE, binder pTM, complex RMSD 같은 기준을 사용합니다.

이 부분은 독자에게 두 가지를 말해줍니다. 첫째, AlphaProteo의 성능은 generation만의 성능이 아닙니다. 후보를 만들고, structural plausibility와 interface confidence proxy로 걸러내고, 그중 일부를 wet-lab으로 넘기는 전체 pipeline의 성능입니다. 둘째, AF3-based in silico benchmark는 wet-lab generalization과 같지 않습니다. technical report는 random PDB target들에서 in silico performance가 실험 target들과 비슷하다고 말하지만, 이것은 200개 PDB target을 모두 실험했다는 뜻이 아닙니다.

TNF-alpha 실패가 이 점을 보여줍니다. 저자들은 broad in silico screening 후 TNF-alpha를 매우 어려운 target으로 골랐고, 실제로 54개 design에서 hit를 얻지 못했습니다. Flat하고 highly polar한 homotrimer interface가 문제였을 가능성을 technical report는 제시합니다. 이 실패는 AlphaProteo의 headline을 약하게 만드는 잡음이 아니라, binder design의 target difficulty가 여전히 크다는 중요한 datapoint입니다.

Figure 2–3: intended epitope와 specificity의 범위

Binding signal이 있다고 해서 설계한 epitope에 설계한 방식으로 붙었다고 바로 말할 수는 없습니다. AlphaProteo는 이 부분을 확인하기 위해 competition assay와 interface mutation을 사용합니다. Target site를 공유하는 known competitor가 binding signal을 줄이는지 보고, binder interface residue를 mutation했을 때 binding이 줄어드는지도 봅니다. 대체로 design interface가 binding에 관여한다는 방향의 결과가 나옵니다.

Specificity도 테스트합니다. Top binder subset을 성공한 7개 target protein에 대해 교차 테스트했고, 각 binder가 intended target에만 observable binding을 보였다고 보고합니다. 이 결과는 target-panel specificity를 지지합니다.

다만 이것을 global specificity나 therapeutic developability로 확장하면 안 됩니다. 7개 target panel에서 off-target binding이 보이지 않았다는 것과 proteome-wide off-target risk가 낮다는 것은 다른 주장입니다. technical report도 downstream application에서는 proteome-wide off-target assay 같은 더 넓은 specificity test가 필요하다고 적습니다. 독자 입장에서는 “intended epitope support와 small target-panel specificity는 있다. 하지만 broad specificity는 아직 직접 보인 evidence가 아니다” 정도로 잡는 것이 정확합니다.

Figure 4: functional validation 사례

AlphaProteo가 binding-only report에 머물지 않는다는 점도 볼 지점입니다. technical report는 두 target에서 functional validation을 제시합니다.

첫 번째는 SARS-CoV-2 RBD binder입니다. 네 개 SC2RBD binder를 live-virus neutralization assay에서 테스트했고, ancestral strain에 대해 EC50 89–300 nM 범위의 neutralization을 보였습니다. 두 binder는 세 가지 variant를 neutralized했고, 네 variant 모두 적어도 하나의 designed binder로 neutralized되었습니다. EC50는 in vitro binding affinity보다 2–10배 높았는데, 저자들은 같은 assay에서 monoclonal antibody도 비슷한 차이를 보인다고 설명합니다.

두 번째는 VEGF-A binder입니다. GDM_VEGFA_54와 GDM_VEGFA_71을 HUVEC cell에서 VEGF-A stimulation 조건으로 테스트했습니다. GDM_VEGFA_54는 VEGFR2, ERK, AKT phosphorylation을 크게 줄였고, technical report는 그 효과가 VEGFR2 inhibitor ki8751과 비슷하며 equimolar bevacizumab보다 강하게 보였다고 설명합니다. GDM_VEGFA_71도 더 약하지만 visible reduction을 보였습니다.

이 두 결과는 AlphaProteo의 evidence를 두껍게 만듭니다. Binding assay만으로는 functional antagonism이나 neutralization을 말하기 어렵기 때문에, 이 두 사례는 evidence layer가 한 단계 더 올라간 부분입니다. 단순히 purified binding assay에서 target에 붙는 것이 아니라, virus neutralization과 cell signaling inhibition이라는 biological readout까지 일부 보여주기 때문입니다. 물론 기능 검증은 두 target에 한정됩니다. 모든 AlphaProteo binder가 functional modulator라는 뜻은 아닙니다. 그래도 binder design technical report에서 binding affinity, target engagement, functional readout이 한 글 안에 연결되는 것은 분명히 강한 장점입니다.

Figure 5: designed pose와 experimental structure

AlphaProteo는 구조 검증도 제공합니다. SARS-CoV-2 spike binder 네 개는 cryo-EM으로 complex structure를 확인했고, resolution은 4.5–6.0 Å입니다. Target-aligned binder C-alpha RMSD는 design model 대비 0.84–3.14 Å 범위로 보고됩니다. VEGF-A binder GDM_VEGFA_71은 X-ray crystal structure로 확인했고, 2.56–2.65 Å resolution입니다. Binder monomer는 design 대비 0.78 Å C-alpha RMSD, target-aligned binder RMSD는 1.65 Å였습니다. 설계된 hydrogen bond 일부도 재현됩니다.

이 구조 결과는 AlphaProteo를 단순 hit-rate report보다 더 흥미롭게 만듭니다. De novo binder design에서 binding이 나왔다고 해도 실제 binding mode가 design pose와 다를 수 있습니다. 여기서는 적어도 선택된 사례에서 designed binding mode가 실험 구조와 잘 맞았다는 evidence가 붙습니다.

다만 cryo-EM resolution과 target coverage를 같이 봐야 합니다. SC2RBD 구조는 설계 pose를 지지하지만 side-chain level의 세부 interface까지 모두 강하게 말하기에는 resolution이 제한적입니다. VEGF-A X-ray structure는 더 강한 pose validation입니다. 즉 구조 검증의 강도도 target마다 다릅니다.

Developability evidence의 범위

technical report는 selected hits의 expression과 biophysical characterization도 일부 보고합니다. Follow-up으로 고른 design 중 93%가 E. coli에서 발현되었고, 대부분 SEC에서 monodisperse하게 나왔습니다. CD spectroscopy를 본 subset에서는 예상한 secondary structure와 높은 thermostability가 관찰됩니다. AlphaProteo binder는 대체로 5–15 kDa 정도의 작은 de novo protein입니다.

이 결과는 research binder나 early-stage binder candidate로서는 좋은 신호입니다. 작은 크기, bacterial expression, monodispersity, thermal stability는 실험실에서 다루기 좋은 binder의 조건입니다.

하지만 therapeutic developability 전체를 보여주지는 않습니다. Immunogenicity, PK/PD, tissue distribution, formulation stability, manufacturability, broad off-target specificity 같은 항목은 이 report의 직접 evidence 밖에 있습니다. AlphaProteo가 만든 binder가 바로 therapeutic lead라는 뜻이 아니라, high-affinity research binder 또는 starting point로서 강한 후보를 만들었다고 보는 편이 맞습니다.

Closed system benchmark로서의 한계

AlphaProteo의 가장 큰 한계는 method disclosure입니다. Machine learning architecture, training details, weights, inference procedure가 공개되지 않았기 때문에 독립 재현이나 module-level comparison이 어렵습니다. 이 technical report는 “어떻게 만들 것인가”보다 “closed system이 어떤 validation 성과를 냈는가”에 더 가깝습니다.

두 번째 한계는 target 수입니다. 8개 target 중 7개 성공은 인상적이지만, binder design의 target universe는 훨씬 넓습니다. 특히 flat, polar, conformationally flexible interface는 여전히 어렵습니다. TNF-alpha 실패는 이 문제를 보여줍니다.

세 번째는 evidence layer가 target마다 다르다는 점입니다. 모든 target에서 binding hit와 affinity는 보고되지만, functional validation은 SC2RBD와 VEGF-A에 집중되어 있고, high-resolution structural validation은 VEGF-A 사례가 가장 강합니다. Specificity도 seven-target panel 수준입니다. 따라서 AlphaProteo의 headline을 읽을 때는 “어떤 target에서 어떤 assay까지 갔는가”를 같이 봐야 합니다.

마지막으로, cross-method comparison은 assay context를 떠나서 해석하기 어렵습니다. AlphaProteo와 RFdiffusion의 비교는 매우 흥미롭지만, yeast display, BLI, target concentration, candidate source가 조금씩 다릅니다. 숫자를 그대로 순위표처럼 읽기보다는, AlphaProteo가 medium-throughput experimental setup에서 높은 post-filter hit rate와 affinity를 냈다는 방향으로 읽는 것이 더 안전합니다.

평가: wet-lab hit-rate benchmark로서의 AlphaProteo

내가 보기에 AlphaProteo의 의미는 “DeepMind가 또 하나의 모델을 만들었다”보다 조금 더 구체적입니다. 이 technical report는 de novo binder design이 예쁜 구조 그림이나 in silico confidence score만으로 평가되던 단계를 지나, 제한된 experimental budget 안에서 몇 개의 실제 binder를 얻을 수 있는지 경쟁하는 단계로 넘어가고 있음을 보여줍니다.

특히 47–172개 design이라는 작은 denominator에서 9–88% hit rate를 얻고, 여러 target에서 sub-nanomolar affinity를 보고한 점은 강합니다. VEGF-A처럼 기존 computational binder가 없던 target에서 binding, cell signaling inhibition, X-ray structure까지 연결한 것도 인상적입니다. AlphaProteo가 공개 system이었다면 binder design field의 practical pipeline이 빠르게 바뀌었을 겁니다.

동시에 이 technical report는 closed system의 한계도 선명하게 남깁니다. 우리는 AlphaProteo의 결과를 성능 benchmark로 사용할 수 있지만, method를 뜯어 재현하거나 개선하기는 어렵습니다. 그래서 RFdiffusion, ProteinMPNN, BindCraft 같은 공개 pipeline과 같은 층위에 놓기보다는, “closed target-conditioned binder design system이 medium-throughput validation에서 보여준 높은 기준선”으로 두는 것이 더 정확합니다.

AlphaProteo는 binder design의 끝이라기보다 기준선을 올린 사례입니다. Generator만 좋은 것으로는 부족하고, filtering과 validation이 붙어야 설계 주장이 강해집니다. 반대로 좋은 filtering과 validation stack이 붙으면, 수십~수백 개 후보만으로도 high-affinity research binder를 얻을 수 있다는 가능성을 설득력 있게 보여줍니다. 이 균형이 AlphaProteo를 읽을 때 가장 중요한 포인트입니다. 성능은 분명히 강하고, 공개성은 제한적이며, validation은 넓지만 균일하지 않습니다. 세 문장을 동시에 붙잡고 읽을 때 이 report의 위치가 가장 또렷해집니다.

참고

•

Zambaldi et al., “De novo design of high-affinity protein binders with AlphaProteo”, Google DeepMind technical report, 2024. arXiv: https://arxiv.org/abs/2409.08022

•

Watson et al., “De novo design of protein structure and function with RFdiffusion”, Nature, 2023.