BindCraft 논문 리뷰

들어가며

RFdiffusion과 다른 출발점

Figure 1: AF2 hallucination에서 실험 후보까지

Filtering이 성능의 절반이다

Figure 1의 숫자: 12개 target, 평균 46.3%

Figure 2–5: binding을 넘어 function으로

구조 검증과 specificity

“one-shot”이라는 표현의 읽는 법

평가: AF2를 filter에서 objective로 끌어올린 practical milestone

참고

BindCraft 논문 리뷰

들어가며

Protein binder design 논문을 읽다 보면 headline hit rate가 먼저 눈에 들어옵니다. 몇 개 target에서 몇 퍼센트가 bind했고, best binder의 KD가 얼마였는지 같은 숫자입니다. 하지만 실제로 중요한 질문은 조금 다릅니다. 그 binder가 어디서 나왔는가. Raw generator에서 바로 나온 것인가, 아니면 많은 computational trajectory와 filtering을 거친 것인가. Experimental denominator는 generated design인가, filter-passing design인가, 아니면 주문해서 발현까지 된 소수 후보인가.

BindCraft는 이 질문을 정면으로 보여주는 논문입니다. 2025년 Nature에 실린 “One-shot design of functional protein binders with BindCraft”는 AlphaFold2를 이용해 de novo miniprotein binder를 설계하는 open-source pipeline을 제시합니다. 논문은 12개 target에서 10–100%, 평균 46.3%의 experimental success rate를 보고하고, 여러 target에서 nanomolar affinity binder를 얻었다고 주장합니다.

이 숫자는 인상적입니다. 다만 “one-shot”이라는 표현을 그대로 받아들이면 오해가 생깁니다. BindCraft는 single prompt가 곧 binder 하나를 내는 시스템이 아닙니다. AF2 backpropagation으로 binder를 hallucinate하고, MPNNsol로 sequence를 재설계하고, AF2 monomer/complex와 PyRosetta metric으로 강하게 거른 뒤, 소수 후보를 실험으로 넘기는 pipeline입니다. 이 논문의 의미는 “필터 없이 바로 맞는 binder”보다, AF2 confidence와 interface objective를 설계 루프 안에 넣었을 때 practical binder discovery가 얼마나 강해질 수 있는지를 보여준 데 있습니다.

RFdiffusion과 다른 출발점

RFdiffusion 이후 target-conditioned backbone generation은 binder design의 대표적인 흐름이 됐습니다. Target surface 주변에 binder backbone을 생성하고, ProteinMPNN으로 sequence를 붙인 뒤, AF2/ESMFold 같은 structure predictor로 다시 접히는지 확인합니다. 이 방식에서 predictor는 주로 후단 filter입니다.

BindCraft는 출발점이 다릅니다. AlphaFold2 Multimer를 설계 루프 안에 넣고, binder sequence를 optimization variable처럼 다룹니다. ColabDesign 계열의 AF2 hallucination을 binder design으로 확장한 셈입니다. pLDDT, i_pTM, intra-/inter-chain pAE, contact loss, radius of gyration, helicity loss 같은 objective를 조합하고, AF2 network를 통해 backpropagation하면서 target에 붙을 binder sequence와 interface를 함께 만들어 갑니다.

이 차이는 꽤 큽니다. RFdiffusion이 “그럴듯한 binder backbone을 먼저 만들고 나중에 sequence와 filter를 붙이는” 쪽에 가깝다면, BindCraft는 “AF2가 confidence 있게 target-binder complex로 예측하는 방향으로 sequence/interface를 직접 밀어붙이는” 쪽에 가깝습니다. 그래서 BindCraft의 성능을 generation model 하나의 힘으로만 보기보다, AF2-guided optimization과 후단 filtering이 맞물린 pipeline 성과로 보는 편이 안전합니다.

Figure 1: AF2 hallucination에서 실험 후보까지

Figure 1은 BindCraft pipeline을 간단하게 보여줍니다. 입력은 target protein structure입니다. 사용자는 binder length와 hotspot을 지정할 수 있고, hotspot을 비워두면 pipeline이 composite loss에 따라 가능한 binding site를 찾습니다. Target structure는 실험 구조일 수도 있고, AlphaFold2/AlphaFold3/Boltz/Rosetta/MD 예측 구조일 수도 있습니다.

첫 단계에서는 AF2 Multimer를 사용해 binder backbone과 sequence를 hallucinate합니다. 이때 target은 완전히 rigid하게 고정되지 않습니다. 논문은 target side chain과 backbone에 일정 수준의 flexibility를 허용한다고 설명합니다. 이 점은 interface가 target surface에 맞춰지는 데 도움이 될 수 있지만, 동시에 target preparation과 trimming에 민감해질 수 있습니다.

두 번째 단계에서는 MPNNsol로 binder sequence를 재설계합니다. 중요한 점은 hallucinated interface 주변 residue를 보존하면서 binder core와 surface를 다시 설계한다는 것입니다. 논문은 soluble MPNN weights를 사용해 binder surface가 상대적으로 음전하를 띠도록 만들 수 있다고 설명합니다. 단순히 binding interface만 맞추는 것이 아니라, soluble protein으로 다루기 쉬운 후보를 남기려는 의도가 들어갑니다.

마지막 단계에서는 AF2 monomer model과 PyRosetta metric으로 후보를 거릅니다. AF2 monomer는 multi-chain complex training을 받지 않았기 때문에, 이 모델로도 target-binder complex가 안정적으로 recapitulate된다면 interface signal이 강하다고 볼 수 있습니다. 대신 이 filter는 강합니다. 논문도 prospective binder가 여기서 떨어질 수 있다고 인정합니다.

Filtering이 성능의 절반이다

BindCraft를 읽을 때 가장 중요한 부분은 filtering입니다. Methods 기준 final filter에는 complex pLDDT > 0.8, i_pTM > 0.5, i_pAE < 0.35, Rosetta shape complementarity > 0.60, interface hydrogen bond > 3, unsaturated hydrogen bond < 4, binder surface hydrophobicity < 35%, bound/unbound binder RMSD < 3.5 Å, interface lysine/methionine 수 제한 등이 들어갑니다.

이 기준들은 서로 다른 층위를 봅니다. pLDDT, i_pTM, i_pAE는 AF2 confidence와 interface plausibility를 봅니다. Shape complementarity와 hydrogen bond, unsatisfied polar group은 interface geometry를 봅니다. Surface hydrophobicity와 bound/unbound RMSD는 soluble binder로 다룰 가능성과 conformational consistency를 봅니다. 즉 BindCraft는 단순히 “AF2가 붙는다고 예측했다”에서 멈추지 않고, 구조적 그럴듯함과 실험 후보로서의 최소 조건을 함께 거릅니다.

이 점이 BindCraft의 강점이자 해석상의 주의점입니다. 논문에서 보고되는 success rate는 raw hallucination success가 아닙니다. Computational trajectory를 많이 만들고, early termination과 MPNNsol redesign, AF2 monomer recapitulation, PyRosetta filtering을 거친 후보 중 실험으로 보낸 denominator에서의 success입니다. 그래서 BindCraft의 높은 hit rate는 generator-only 성능이 아니라, optimization-filtering-assay handoff가 잘 맞물린 결과로 보는 편이 안전합니다.

Figure 1의 숫자: 12개 target, 평균 46.3%

논문은 12개 target에서 experimental success rate 10–100%, 평균 46.3%를 보고합니다. Figure 1에는 target별로 성공 후보 수와 tested design 수, 그리고 best affinity가 요약되어 있습니다.

대표적으로 PD-1에서는 53개 design 중 13개에서 binding signal이 나왔고, bivalent Fc-fusion format에서 best binder의 apparent Kd*가 1 nM 미만으로 보고됩니다. PD-L1은 9개 중 7개, best Kd* 약 615 nM입니다. IFNAR2는 9개 중 3개, best Kd 약 260 nM입니다. CD45는 16개 중 4개, best Kd 약 14.7 nM입니다. CLDN1 soluble analogue에서는 7개 중 6개가 binding했고, best binder는 200 nM 미만으로 보고됩니다.

Allergen target도 포함됩니다. Der f7은 10개 중 4개, best Kd 약 13 nM입니다. Der f21은 7개 중 4개, best Kd 약 793 nM입니다. Bet v1은 7개 중 2개, best Kd 약 120 nM입니다. Structural/de novo target 쪽에서는 BBF-14에서 11개 중 6개, best Kd 약 21 nM이 보고됩니다. Multi-domain nuclease target으로 SpCas9는 6개 중 6개, CbAgo는 12개 중 2개가 binding했습니다.

이 denominator는 작습니다. 바로 그 점이 BindCraft의 메시지입니다. 수천~수만 후보를 wet-lab으로 넘기는 것이 아니라, 강하게 걸러진 소수 후보만으로도 target별 hit를 얻었다는 주장입니다. 반대로 말하면, headline success rate를 다른 논문과 비교할 때는 반드시 handoff denominator를 맞춰야 합니다. RFdiffusion, AlphaProteo, PXDesign, Latent-X의 hit rate와 BindCraft의 hit rate는 같은 계량 단위가 아닙니다.

Figure 2–5: binding을 넘어 function으로

BindCraft 논문이 단순 binder benchmark보다 흥미로운 이유는 functional validation을 여러 방향으로 보여주기 때문입니다. Cell-surface receptor binder에서는 PD-1, PD-L1, IFNAR2, CD45, CLDN1 같은 target을 다룹니다. IFNAR2에서는 native cytokine IFNA2와의 competition을 통해 설계된 binding mode가 기능적으로 의미 있는 위치를 차지한다는 점을 보여주고, structural similarity가 있는 다른 immunoglobulin-like receptor에 대한 off-target binding도 확인합니다.

CLDN1 사례는 membrane protein target의 어려움을 잘 보여줍니다. 논문은 soluble analogue를 사용해 binder를 설계하고, wild-type CLDN1이 있는 cell assay에서 CpE cytotoxicity inhibition을 봅니다. 여기서 결과는 단순 binding을 넘어 cell-context functional effect에 가까워집니다. 다만 soluble analogue와 native membrane context 사이에는 차이가 있으므로, 이 사례도 target presentation과 assay context를 함께 읽어야 합니다.

Allergen 쪽에서는 Bet v1 binder가 patient-derived serum에서 IgE binding을 일부 blocking합니다. 이건 therapeutic relevance를 암시하지만, 논문도 단일 binder의 blocking activity가 moderate하다고 설명합니다. 더 넓은 epitope coverage가 필요할 수 있다는 점을 함께 봐야 합니다.

SpCas9와 CbAgo에서는 designed binder가 nucleic-acid interaction interface를 target하고, gene editing 또는 DNA cleavage activity를 낮춥니다. AAV retargeting에서는 HER2/PD-L1 receptor-specific binder를 AAV capsid에 삽입해 target receptor-expressing cell로 transduction specificity를 바꾸는 실험이 포함됩니다. 이 부분은 BindCraft가 “binding protein을 만든다”를 넘어, 설계된 binder를 기능 모듈로 쓸 수 있음을 보여주는 강한 사례입니다.

구조 검증과 specificity

BindCraft는 일부 target에서 structural validation도 제공합니다. BBF-14–binder4 complex structure는 design model과 target-aligned binder backbone RMSD 1.7 Å로 맞았다고 보고됩니다. Allergen binder에서도 crystal structure와 epitope targeting 관련 검증이 들어갑니다. 이런 사례는 AF2-guided design이 실제 complex pose와 연결될 수 있음을 보여줍니다.

Specificity도 일부 다룹니다. IFNAR2 binder의 경우 구조적으로 비슷한 immunoglobulin-like receptor에 대해 off-target binding을 확인했고, 논문은 AF2 i_pTM metric이 on-target interaction과 off-target을 구분하는 데 유용했다고 설명합니다. 하지만 이 결과를 모든 target으로 일반화하기는 어렵습니다. Specificity는 target family, epitope conservation, assay sensitivity에 따라 달라집니다. BindCraft가 specificity를 다룬다는 점은 장점이지만, 각 target에서 같은 깊이로 확인된 것은 아닙니다.

“one-shot”이라는 표현의 읽는 법

논문 제목의 “one-shot”은 강한 표현입니다. 여기서 one-shot은 directed evolution이나 large-scale screening 없이 computational design에서 바로 functional binder를 얻었다는 의미에 가깝습니다. 그러나 pipeline 내부에서는 많은 computational attempts가 있습니다. 논문과 supplementary text도 100개 이상의 filtered design 생성을 권장하고, target structure variation이나 trimming에 따라 in silico success rate가 크게 바뀔 수 있다고 말합니다.

따라서 독자가 가져가야 할 메시지는 “한 번에 하나 뽑으면 바로 붙는다”가 아닙니다. 더 정확히는 “충분한 computational sampling과 stringent filtering을 거치면, 소수 wet-lab 후보만으로도 여러 target에서 binder hit를 얻을 수 있다”입니다. 이 차이를 놓치면 BindCraft의 성과를 과대평가하게 됩니다.

또 하나 중요한 caveat는 i_pTM입니다. 논문은 i_pTM이 binding/non-binding을 구분하는 데 유용할 수 있다고 보지만, affinity와는 잘 correlate하지 않는다고 설명합니다. 즉 i_pTM이 높다고 더 tight binder라는 뜻은 아닙니다. 이 점은 binder design에서 반복해서 나오는 함정입니다. 구조적 confidence와 affinity는 연결될 수 있지만 같은 것이 아닙니다.

평가: AF2를 filter에서 objective로 끌어올린 practical milestone

BindCraft의 가장 큰 의미는 AF2를 후단 filter로만 쓰지 않고, design objective로 끌어올렸다는 데 있습니다. Backbone generator → sequence design → AF2 filtering이라는 익숙한 pipeline에서 벗어나, AF2 network를 직접 미분하며 target interface를 만족하는 binder를 hallucinate합니다. 그리고 MPNNsol, AF2 monomer recapitulation, PyRosetta metric을 통해 실험 후보를 강하게 좁힙니다.

그래서 BindCraft는 “새로운 foundation model”이라기보다, existing structure predictor를 실용적인 binder design engine으로 바꾸는 pipeline 논문입니다. Open-source라는 점도 중요합니다. AlphaProteo나 Latent-X처럼 closed system이 높은 성능을 보고하는 흐름과 달리, BindCraft는 재현 가능한 open workflow를 제시합니다.

한계도 명확합니다. Target preparation과 hotspot/trimming 선택에 민감하고, AF2 confidence에 기대는 만큼 model bias와 false positive가 남습니다. Strong AF2/Rosetta filtering은 hit rate를 높일 수 있지만, 동시에 다른 형태의 viable binder를 배제할 수 있습니다. Functional validation도 target마다 깊이가 다릅니다. Binding, competition, structural validation, cell assay, nuclease inhibition, AAV retargeting은 모두 의미 있지만 같은 evidence layer는 아닙니다.

그럼에도 BindCraft는 현재 general/miniprotein binder design을 이해하는 데 중요한 기준점입니다. 이 논문은 binder design이 generator 하나의 문제가 아니라는 점을 잘 보여줍니다. 어느 단계에서 model prior를 쓰고, 어떤 proxy를 objective로 넣고, 어떤 filter를 통과한 후보를 wet-lab으로 넘기는지가 성능을 결정합니다. BindCraft의 진짜 메시지는 그 pipeline 전체에 있습니다.

참고

•

Pacesa, M. et al. “One-shot design of functional protein binders with BindCraft”, Nature, 2025.

•

DOI: https://doi.org/10.1038/s41586-025-09429-6

•

주요 비교 축: RFdiffusion, AlphaProteo, PXDesign, Latent-X, BoltzGen.