BoltzDesign1 논문 리뷰

들어가며

Prediction model inversion으로 넘어가기

Pairformer distogram과 Confidence module

Contact loss의 조심스러운 설계

Small-molecule benchmark: RFdiffusionAA와 비교할 때의 선

Gnina와 diversity가 말해주는 것

General biomolecular target demonstrations

Figure별로 읽기

BoltzDesign1과 BoltzGen의 차이

이 논문을 읽을 때의 guardrail

평가: open all-atom design method의 초기 좌표

참고

BoltzDesign1 논문 리뷰

들어가며

Latent-X와 Latent-X2를 읽고 나면 자연스럽게 이런 질문이 남습니다. Closed platform이 높은 post-filter hit rate와 developability evidence를 보여주는 흐름이 강해지고 있는데, open academic side에서는 어떤 길이 남아 있을까? BoltzDesign1은 그 질문에 대한 한 가지 open-side 답입니다.

BoltzDesign1은 “Boltz-1을 BindCraft처럼 거꾸로 쓰면 all-atom biomolecular binder design도 할 수 있는가?”를 묻는 bioRxiv preprint입니다. 저자는 MIT와 EPFL/SIB 중심의 academic collaboration이고, code/data는 GitHub에 공개되어 있다고 설명합니다. 이 점에서 Latent-X 계열과 위치가 다릅니다. 성능 보고보다 method proposal에 가깝고, closed platform이 아니라 open predictor inversion pipeline입니다.

다만 첫 문단에서 선을 그어야 합니다. BoltzDesign1에는 wet-lab validation이 없습니다. 논문에서 말하는 success rate는 AF3 confidence, Boltz-1/AF3 cross-model RMSD, self-consistency, Gnina score 같은 in silico proxy입니다. 이 denominator는 실험 후보 수가 아니라 computationally scored candidate set입니다. 따라서 RFdiffusionAA나 Latent-X처럼 실험 hit rate를 보여준 논문과 같은 evidence layer에 놓으면 안 됩니다.

Prediction model inversion으로 넘어가기

Protein design에는 크게 두 길이 있습니다. 하나는 RFdiffusion처럼 prediction model lineage를 generative model로 fine-tune해 backbone이나 complex를 직접 sample하는 방식입니다. 다른 하나는 BindCraft/ColabDesign처럼 structure prediction model을 고정해 두고, 원하는 interface를 만들도록 sequence logits나 structure variables를 역전파로 최적화하는 방식입니다.

BoltzDesign1은 두 번째 길을 all-atom biomolecular context로 확장합니다. AlphaFold3, RFAA, Boltz-1 같은 모델은 protein, ligand, metal, nucleic acid, covalent modification을 한 framework 안에서 다룰 수 있게 만들었습니다. 하지만 AF3-style model의 diffusion module을 끝까지 backpropagation하는 것은 memory와 compute가 크고 gradient도 불안정할 수 있습니다.

BoltzDesign1의 선택은 diffusion module을 직접 미분하지 않는 것입니다. Boltz-1 전체를 generator로 바꾸는 대신, Pairformer가 만드는 distogram과 Confidence module을 design signal로 사용합니다. 즉 3D structure 하나를 직접 맞추기보다, atom/residue pair distance probability distribution을 원하는 contact pattern 쪽으로 밀어 넣는 방식입니다.

Pairformer distogram과 Confidence module

BoltzDesign1에는 두 mode가 있습니다. 첫 번째는 Pairformer-only hallucination입니다. Pairformer distogram에 intra-contact loss와 inter-contact loss를 걸고 sequence logits를 optimize합니다. Binder 내부 long-range contact와 target-binder contact를 동시에 유도하는 방식입니다.

두 번째는 Pairformer plus Confidence hallucination입니다. Pairformer output과 diffusion에서 나온 structure를 Confidence module에 넣고, confidence loss도 함께 backpropagate합니다. 이때 diffusion module에는 stop-gradient를 걸어 expensive diffusion step을 직접 미분하지 않습니다. 논문의 방법론적 포인트는 여기에 있습니다. AF3/Boltz-style diffusion predictor를 그대로 끝까지 differentiable design machine으로 쓰지 않고, 중간 representation과 confidence head를 design objective로 활용합니다.

Sequence optimization은 BindCraft와 닮았습니다. Relaxed sequence space에서 시작해 점점 one-hot sequence로 이동합니다. Warm-up, softmax/logit interpolation, temperature annealing, straight-through one-hot optimization으로 이어지는 4-stage process입니다. 이후 LigandMPNN을 optional post-processing으로 사용해 interface를 고정하고 surface/core를 redesign하거나, LigandMPNN sequence를 초기값으로 넣어 BoltzDesign1에서 다시 joint optimization할 수 있습니다.

Contact loss의 조심스러운 설계

BoltzDesign1은 distogram low-distance bins의 entropy를 낮추는 방식으로 contact를 유도합니다. Inter-contact loss는 target과 binder 사이 contact를 만들고, intra-contact loss는 binder 내부 long-range contact를 만듭니다. Intra-contact에서는 sequence-neighbor residue pair, 즉 i − j < 9인 가까운 residue pair를 무시합니다. 단순 helix confidence만 높여 loss를 만족하는 현상을 줄이기 위한 장치입니다.

Small molecule design에서는 더 조심합니다. Protein-protein interface에서 쓰는 inter-contact loss를 ligand에 그대로 적용하면 protein이 ligand를 과도하게 감싸고 apo-holo RMSD가 커질 수 있습니다. 그래서 small-molecule design에서는 먼저 ligand 없이 protein fold를 만든 뒤 ligand를 넣고, 매 step에서 가장 confident한 single contact만 직접 optimize합니다.

이 detail은 중요합니다. BoltzDesign1은 “all-atom predictor를 거꾸로 미분하면 무엇이든 된다”는 단순한 이야기가 아닙니다. Target class마다 loss가 달라지고, small molecule에서는 contact를 너무 많이 강제하면 오히려 비현실적인 pocket이 생길 수 있습니다. 이 논문은 그 failure mode를 의식적으로 피하려고 합니다.

Small-molecule benchmark: RFdiffusionAA와 비교할 때의 선

BoltzDesign1의 중심 benchmark는 RFAA/RFdiffusionAA에서 사용한 네 small molecule target, IAI, FAD, SAM, OQO입니다. 각 ligand에 대해 30 structures를 만들고, LigandMPNN으로 structure당 5 sequences를 redesign한 뒤 AF3로 다시 예측합니다.

평가는 네 가지 proxy가 중심입니다. 첫째, AF3 confidence입니다. Strict criterion은 complex pLDDT > 0.7와 AF3 ipAE < 10이고, relaxed criterion은 ipAE < 15입니다. 둘째, Boltz-1 predicted structure와 AF3 predicted structure의 RMSD < 2 Å입니다. 셋째, LigandMPNN redesign 전후 Boltz-1/AF3 self-consistency입니다. 넷째, AF3-predicted binding site에서 Gnina CNN affinity와 pose score를 봅니다.

논문은 BoltzDesign1이 네 ligand 전체에서 RFdiffusionAA designs보다 높은 AF3-based in silico success rate를 보였다고 보고합니다. Pairformer-only와 Pairformer plus Confidence 모두 RFdiffusionAA보다 높았고, Confidence mode는 4개 중 3개 ligand에서 Pairformer-only보다 나았습니다.

하지만 이 비교는 반드시 낮춰 읽어야 합니다. RFdiffusionAA는 digoxigenin, heme, bilin에서 wet-lab validation을 제시한 RFAA 논문의 design fine-tune입니다. BoltzDesign1은 같은 ligand benchmark에서 AF3/Boltz/Gnina proxy를 더 잘 통과했다는 결과입니다. 그러므로 “RFdiffusionAA보다 binder를 더 잘 만든다”가 아니라, “RFdiffusionAA benchmark targets에서 AF3-based in silico filters를 더 자주 통과했다”가 정확한 표현입니다.

Gnina와 diversity가 말해주는 것

BoltzDesign1은 Gnina CNN VS score에서도 일부 긍정적인 신호를 제시합니다. 대부분 design은 wild-type PDB complex/ligand보다 낮은 score를 받지만, 일부 design은 native ligand binder보다 높은 score를 받습니다. SAM 9.3%, OQO 7.3%, IAI 4.0% 정도입니다.

이 숫자는 흥미롭지만 binding evidence는 아닙니다. Docking score는 prioritization proxy입니다. 특히 de novo pocket과 flexible ligand context에서는 scoring function의 calibration을 그대로 믿기 어렵습니다. Gnina에서 점수가 좋아도 expression, folding, ligand binding, specificity는 별도 검증 영역입니다.

Diversity에서는 BoltzDesign1 average pairwise TM-score가 0.36, RFdiffusionAA가 0.46으로 보고됩니다. 낮은 TM-score는 더 다양한 topology를 의미합니다. Helix loss term을 더 강하게 넣으면 beta-sheet content가 늘어나는 결과도 제시됩니다. 이 부분은 BoltzDesign1이 단순 helical binder에만 갇히지 않을 가능성을 보여줍니다. 다만 diversity가 높다고 해서 hit rate가 높다는 뜻은 아닙니다.

General biomolecular target demonstrations

BoltzDesign1은 small molecule 외에도 metal, DNA, PTM/covalent modification target examples를 보여줍니다. Zinc와 iron metal binders, B-DNA binder, phosphorylation-site binder, glycosylation-site binder가 포함됩니다.

Metal example에서는 LigandMPNN redesign 후 AF3 prediction과 AllMetal3D를 사용해 metal identity와 coordination을 평가합니다. Iron example에서는 expected tetrahedral/octahedral coordination과 Tyr/Asp/His 같은 known iron-binding residues를 보였다고 설명합니다. DNA example에서는 positive charge distribution, phosphate backbone interaction, base-specific hydrogen bond를 보여줍니다.

PTM/covalent modification examples에서는 PCNA Tyr-211 phosphorylation, Smad2 Ser-201/Ser-203 phosphorylation, CD45 glycosylation site binders가 등장합니다. Pocket contact와 modification position을 one-hot token feature로 constraint로 넣고, phosphosite나 sugar moiety와 hydrogen bond/interactions를 만드는 designs를 제시합니다.

이 section은 BoltzDesign1의 scope를 보여주는 figure입니다. 그러나 evidence layer는 demonstration입니다. Binding assay, functional modulation, specificity, high-resolution structural validation은 없습니다. 따라서 “all-atom biomolecular binder design으로 확장 가능해 보인다” 정도로 읽는 편이 안전합니다.

Figure별로 읽기

Figure 1은 전체 pipeline입니다. Pairformer distogram과 Confidence module을 이용해 sequence/structure를 hallucinate하고, LigandMPNN redesign과 downstream evaluation으로 넘어가는 흐름을 보여줍니다. 이 figure에서는 diffusion module을 직접 미분하지 않는다는 점을 잡아야 합니다.

Figure 2는 Pairformer distogram contact와 confidence metric의 관계입니다. BindCraft successful binders 212개 / 13 targets를 대상으로 Pairformer distogram이 diffusion-predicted contact와 얼마나 맞는지 봅니다. Distogram contact loss가 design signal로 쓸 만한지 보여주는 근거입니다.

Figure 3은 small-molecule binder examples와 AF3 success-rate comparison입니다. 여기서 RFdiffusionAA와의 비교가 나오지만, proxy comparison이라는 선을 유지해서 읽는 편이 안전합니다.

Figure 4는 diversity와 secondary-structure control입니다. Helix loss를 조정하면서 beta-sheet content와 topology diversity가 바뀌는 부분입니다. Figure 5는 metal, B-DNA, phosphorylation, glycosylation binder demonstrations입니다. 이 figure는 scope figure이지 wet-lab validation figure가 아닙니다.

Supplementary Figure S1–S4도 중요합니다. Interface residues가 LigandMPNN redesign 후 더 높은 sequence recovery를 보이는지, recycle=0과 fixed-interface surface-redesign setup이 왜 좋은지, cross-model consistency/self-consistency/Gnina score가 mode별로 어떻게 다른지 보여줍니다.

BoltzDesign1과 BoltzGen의 차이

BoltzDesign1과 BoltzGen은 이름 때문에 쉽게 섞입니다. 하지만 두 논문의 위치는 다릅니다. BoltzDesign1은 Boltz-1 predictor를 거꾸로 미분하는 hallucination/inversion pipeline입니다. 새로운 broad wet-lab platform을 제시하기보다, AF3/Boltz-style all-atom predictor의 Pairformer와 confidence head를 design objective로 쓸 수 있는지를 보여줍니다.

BoltzGen은 design-oriented generative model과 specification language, 그리고 여러 wet-lab campaign을 앞세우는 쪽입니다. Nanobody, protein binder, peptide, cyclic peptide, small-molecule binder까지 하나의 model/specification interface로 다루고, campaign별 assay를 제시합니다.

따라서 BoltzDesign1은 “open all-atom predictor inversion method”이고, BoltzGen은 “open all-atom binder generation platform”에 더 가깝습니다. 둘 다 Boltz lineage에 있지만, evidence layer와 사용 목적이 다릅니다.

이 논문을 읽을 때의 guardrail

첫째, wet-lab validation이 없습니다. 논문이 말하는 success는 AF3/Boltz/Gnina/self-consistency proxy입니다. Binding hit rate, affinity, specificity, expression, functional validation으로 읽으면 안 됩니다.

둘째, Boltz-1을 design machinery와 일부 evaluation/self-consistency에 함께 사용합니다. Cross-model AF3 evaluation이 들어가지만, predictor inversion에는 model-bias와 overfitting risk가 남습니다. 논문도 single-model design plus prediction overfitting을 limitation으로 언급합니다.

셋째, nucleic acid designs에서는 distogram-confidence correlation이 약합니다. Nucleic acid design은 protein-small molecule이나 protein-protein design보다 calibration이 더 어려워 보입니다.

넷째, RFdiffusionAA와 비교할 때 experimental evidence를 섞으면 안 됩니다. BoltzDesign1은 RFdiffusionAA benchmark에서 proxy success가 높다는 claim이고, RFdiffusionAA는 일부 target에서 실제 wet-lab validation을 가진 system입니다.

다섯째, general biomolecular target examples는 scope demonstration입니다. Metal, DNA, PTM/glycan binder figures는 가능성을 보여주지만, 아직 validated binder panel은 아닙니다.

평가: open all-atom design method의 초기 좌표

BoltzDesign1은 성능 milestone이라기보다 method milestone에 가깝습니다. AF3/Boltz-style all-atom predictor를 design objective로 바꾸는 길을 제시하고, expensive diffusion module을 직접 미분하지 않아도 Pairformer distogram과 Confidence module에서 design signal을 뽑을 수 있음을 보여줍니다.

이 아이디어가 중요한 이유는 명확합니다. Closed platform들이 실험 성능을 앞세우는 동안, open academic ecosystem에서는 prediction model을 어떻게 design tool로 바꿀지에 대한 실용적인 방법이 필요합니다. BoltzDesign1은 그 방향에서 깔끔한 출발점입니다.

하지만 결론은 낮춰야 합니다. 이 논문은 아직 “binder를 만들었다”보다 “binder처럼 보이는 all-atom design candidates를 predictor-based proxy로 만들었다”에 가깝습니다. Wet-lab validation이 나오기 전까지는 RFdiffusionAA, AlphaProteo, Latent-X와 같은 실험 성능 축에 놓기 어렵습니다. 그래도 open all-atom design method의 한 축으로 기록할 가치는 충분합니다.

참고

- Paper: “BoltzDesign1: Inverting All-Atom Structure Prediction Model for Generalized Biomolecular Binder Design” - Authors: Yehlin Cho, Martin Pacesa, Zhidian Zhang, Bruno E. Correia, Sergey Ovchinnikov - bioRxiv DOI: https://doi.org/10.1101/2025.04.06.647261 - Code/data: https://github.com/yehlincho/BoltzDesign1 - Raw source: `raw/papers/BoltzDesign1/boltzdesign1.pdf` - Extracted source: `raw/papers/BoltzDesign1/extracted/boltzdesign1.txt`