Boltz-1x 논문 리뷰

들어가며

구조가 맞아 보여도 chemistry가 깨질 때

Boltz-steering: retraining이 아니라 inference-time guidance

Sequential Monte Carlo와 energy gradient

Flat-bottom physical potentials

Boltz-1과의 관계

Benchmark setup

Main result: accuracy를 유지하면서 sanity를 높이기

Figure별로 보기

Evidence layer를 분리해서 읽기

Design pipeline에서의 의미

읽을 때의 균형점

Boltz-2와 BoltzGen으로 넘어가기 전에

평가: open predictor에 붙은 physical sanity layer

참고

Boltz-1x 논문 리뷰

들어가며

Boltz-1을 읽으면 open AF3-style structure predictor가 왜 중요한지 보입니다. 하지만 구조 예측 모델이 benchmark에서 높은 점수를 받는 것과, 그 구조가 물리적으로 말이 되는 것은 같은 문제가 아닙니다. Ligand chirality가 틀리거나, aromatic ring이 휘거나, chain이 서로 겹치거나, ligand internal geometry가 깨진 구조도 일부 geometric metric에서는 크게 벌점이 안 될 수 있습니다.

Boltz-1x는 이 틈을 다루는 Boltz-1 updated manuscript의 핵심 업데이트입니다. 별도의 새로운 binder generator가 아니라, Boltz-1 계열 AF3-style all-atom predictor에 inference-time steering을 붙여 physical validity를 개선하려는 시도입니다.

이 글에서는 Boltz-1x를 “Boltz-1보다 몇 점 높은 모델”이 아니라, open structure predictor를 실제 design/filtering infrastructure로 쓰기 위해 붙는 physical sanity layer관점에서 살펴보겠습니다.

Boltz-1 → Boltz-1x → Boltz-2로 이어서 보면 역할 분담이 더 선명합니다. Boltz-1은 open complex predictor입니다. Boltz-1x는 그 predictor의 output이 chemical sanity를 더 잘 지키도록 sampling을 조정합니다. Boltz-2는 그 다음에 affinity라는 별도 decision variable을 다룹니다. 구조 정확도, physical validity, affinity는 서로 이어져 있지만 같은 층이 아닙니다.

구조가 맞아 보여도 chemistry가 깨질 때

Structure prediction benchmark는 보통 LDDT, DockQ, ligand RMSD, LDDT-PLI 같은 metric을 봅니다. 이 metric들은 구조가 reference와 얼마나 가까운지, interface가 대략 맞는지, ligand pose가 pocket 안에서 비슷한지 평가합니다.

하지만 chemical validity는 별도 문제입니다. Chiral center가 뒤집히면 같은 ligand가 아닙니다. Double bond stereochemistry가 틀리면 binding pose 해석이 달라집니다. Steric clash나 chain overlap이 있으면 downstream MD, docking, affinity model 입력으로 쓰기 어렵습니다.

Boltz-1x가 중요한 이유는 여기에 있습니다. High confidence prediction을 그대로 믿는 대신, physical constraint를 sampling trajectory 안에 넣어 output sanity를 높이려 합니다.

이 지점은 candidate filtering에서 특히 현실적입니다. 수천 개 후보를 structure predictor로 평가할 때 사람이 모든 ligand geometry와 chain clash를 눈으로 볼 수는 없습니다. Confidence score가 높아도 chemistry가 깨진 prediction이 섞이면, downstream ranking이 오염됩니다. Boltz-1x는 이 문제를 model output quality control로 끌어들입니다.

Boltz-steering: retraining이 아니라 inference-time guidance

Boltz-1x의 핵심은 Boltz-steering입니다. Base model을 새로 학습하는 것이 아니라, 이미 학습된 diffusion model의 sampling 과정을 physics-inspired potential 쪽으로 guide합니다.

Paper는 이를 Feynman-Kac steering framework로 설명합니다. Trained model distribution `pθ(x0)`를 그대로 쓰지 않고, physical constraint violation을 나타내는 energy `E(x0)`를 넣어 tilted distribution을 만듭니다.

`p_target(x0) ∝ pθ(x0) exp(-λE(x0))`

즉 model이 원래 좋아하는 구조 중에서도 physical energy가 낮은 방향으로 sample을 밀어주는 방식입니다. 단순 post-filtering과 다릅니다. 여러 후보를 뽑은 뒤 나쁜 것을 버리는 것이 아니라, reverse diffusion trajectory 자체를 더 plausible한 쪽으로 guide합니다.

Post-filtering은 후보가 충분히 많고 valid sample이 흔할 때는 괜찮습니다. 하지만 invalid sample이 자주 나오거나 특정 constraint를 만족하는 sample이 드문 경우에는 비효율적입니다. Steering은 sampling 중간에서 trajectory를 바꾸므로, 같은 sampling budget 안에서 physically valid region을 더 자주 방문하게 만드는 접근입니다.

Sequential Monte Carlo와 energy gradient

Boltz-steering은 reverse diffusion 각 timestep에서 predicted denoised conformer의 energy를 계산합니다. Timestep 사이의 energy improvement를 importance weight로 쓰고, 일정 간격으로 particles를 resampling합니다. 동시에 energy gradient step을 proposal에 넣어 physical constraint를 더 잘 만족하는 trajectory로 이동시킵니다.

논문 구현에서는 guidance를 매 timestep 적용하고, exploration을 위해 resampling은 3 timesteps마다 수행한다고 설명합니다. 이 구조는 rare constraint satisfaction을 post-hoc filtering보다 적극적으로 찾기 위한 장치입니다.

이 point는 design pipeline에서 중요합니다. Physical validity violation이 드문 event라면 샘플을 많이 뽑고 나중에 거르는 것으로 충분할 수 있습니다. 하지만 violation이 자주 나오거나, valid sample이 low-probability region에 있으면 sampling path 자체를 조정하는 편이 더 효과적일 수 있습니다.

Flat-bottom physical potentials

Boltz-1x는 여러 physical constraint potential을 사용합니다. Chiral center의 R/S chirality, double-bond E/Z stereochemistry, planar double bond/aromatic planarity, ligand internal geometry, inter-chain steric clash, overlapping symmetric chains, covalent bond geometry가 포함됩니다.

여기서 중요한 것은 flat-bottom potential입니다. 특정 reference conformer에 rigid하게 맞추는 것이 아니라, 물리적으로 허용 가능한 범위 안에 있으면 penalty를 주지 않습니다. Ligand는 flexible하고, 같은 molecule도 여러 plausible conformer를 가질 수 있습니다. Boltz-1x는 exact RMSD matching보다 validity range를 존중하는 방식에 가깝습니다.

이 설계는 과도한 constraint의 부작용을 줄입니다. Physical potential이 너무 rigid하면 model이 reference conformer에 끌려가면서 실제 가능한 alternative pose를 놓칠 수 있습니다. Flat-bottom potential은 “이 범위 안이면 괜찮다”는 식으로 작동해, chemical rule은 지키되 diffusion model의 structural diversity는 어느 정도 남겨둡니다.

이 설계는 structure prediction output correction에 잘 맞습니다. Model이 reference와 완전히 같은 ligand conformer를 낼 필요는 없지만, chirality나 bond geometry처럼 깨지면 안 되는 조건은 지켜야 합니다.

Boltz-1과의 관계

Boltz-1x는 Boltz-1의 대체물이자 별도 lineage라기보다, Boltz-1의 physical-validity-aware inference mode로 보는 것이 자연스럽습니다. Dense MSA pairing, unified cropping, robust pocket conditioning, Kabsch interpolation, confidence model redesign 같은 Boltz-1 contribution은 그대로 배경이 됩니다.

Boltz-1 리뷰에서 Figure 5 failure mode를 봤다면, Boltz-1x는 그 failure mode에 대한 후속 답입니다. Overlapping chain, ligand hallucination, steric clash, ligand geometry error를 sampling 단계에서 줄이려는 것입니다.

따라서 Boltz-1x는 Boltz-2나 BoltzGen처럼 task scope를 바꾸는 모델이 아닙니다. Affinity prediction도 아니고 binder generation도 아닙니다. 구조 예측 output을 더 물리적으로 믿을 수 있게 만드는 layer입니다.

Benchmark setup

Boltz-1x는 updated manuscript에서 AlphaFold3, Chai-1, Boltz-1과 함께 recent PDB test set과 CASP15 benchmark에서 비교됩니다. Evaluation은 Boltz-1과 동일하게 5 samples를 생성하고, top-confidence top-1과 best oracle을 모두 봅니다.

Metrics는 mean all-atom LDDT, DockQ > 0.23, LDDT-PLI, pocket-aligned ligand RMSD < 2 Å, 그리고 PoseBusters-style physical checks입니다. Boltz-1x의 독자적인 의미는 마지막 physical validity metric에서 나옵니다.

Geometric accuracy를 유지하면서 physical checks를 통과하는 비율을 높일 수 있는가가 핵심 질문입니다. 만약 physical steering이 geometry metric을 크게 망가뜨린다면 실용성이 떨어집니다. 반대로 geometry를 유지하면서 clash/chirality/stereochemistry를 줄이면 downstream pipeline에 더 안전한 predictor가 됩니다.

Main result: accuracy를 유지하면서 sanity를 높이기

Paper text는 Boltz-1, AlphaFold3, Chai-1이 geometric metrics에서 broadly comparable하다고 정리합니다. Boltz-1x는 Boltz-steering을 통해 physical quality checks에서 nearly 100% passing에 가까워지면서, Boltz-1의 geometric accuracy와 비슷한 수준을 유지한다고 주장합니다.

다만 exact physical-validity percentages는 조심해서 써야 합니다. Extracted text에서는 figure label과 plot value가 안정적으로 잡히지 않습니다. 따라서 이 리뷰에서는 qualitative claim만 사용합니다. 즉 “Boltz-1x가 physical validity를 크게 개선한다고 주장한다” 정도가 안전합니다.

이 result는 wet-lab validation이 아닙니다. Binding affinity, function, specificity, developability를 보여주는 것도 아닙니다. 하지만 structure prediction output을 downstream design workflow에 넣을 때 false-positive risk를 줄이는 infrastructure evidence로는 중요합니다.

다만 physical validity metric도 선택된 check의 범위 안에서만 의미가 있습니다. PoseBusters-style check는 chirality, stereochemistry, clash, geometry 같은 중요한 항목을 보지만, all possible chemical correctness를 보장하지는 않습니다. Protein flexibility, protonation, water-mediated interaction, induced fit 같은 요소는 여전히 별도 문제입니다.

Figure별로 보기

Figure 2는 Kabsch interpolation과 AF3 reverse diffusion interpolation 차이를 설명합니다. Boltz-1의 coordinate diffusion engineering 배경입니다.

Figure 3은 Boltz-1 confidence model architecture입니다. Boltz-1x에서도 confidence와 steering이 서로 다른 layer라는 점을 생각하며 보면 좋습니다.

Figure 5와 Figure 6은 PDB test set과 CASP15에서 AF3/Chai-1/Boltz-1/Boltz-1x를 비교합니다. Geometric metrics와 physical validity를 나눠서 읽어야 합니다.

Algorithm 2는 Boltz-steering입니다. 이 리뷰에서 가장 중요한 method detail입니다. Algorithms 3–5는 dense MSA pairing, unified cropping, robust pocket-conditioning으로, base Boltz-1 contribution입니다.

Evidence layer를 분리해서 읽기

Boltz-1x의 evidence는 세 층으로 나눌 수 있습니다. 첫 번째는 geometric prediction accuracy입니다. LDDT, DockQ, LDDT-PLI, ligand RMSD가 여기에 들어갑니다.

두 번째는 physical validity입니다. PoseBusters-style chirality, stereochemistry, ligand geometry, clash, chain overlap check가 여기에 들어갑니다. Boltz-1x의 핵심 contribution은 이 층입니다.

세 번째는 downstream utility입니다. 더 physically plausible한 prediction은 docking, MD, affinity prediction, expert inspection, design candidate filtering에 더 안전한 input이 될 수 있습니다. 이 downstream utility가 Boltz-1x의 실용적 의미를 만듭니다.

Design pipeline에서의 의미

Generated binder나 ligand candidate를 평가할 때, 구조 예측 모델은 종종 first-pass filter로 쓰입니다. 이때 high-confidence but physically invalid prediction은 위험합니다. 후보를 잘못 버리거나, 반대로 false positive를 올릴 수 있습니다.

Boltz-1x는 이 문제를 후처리 filter보다 sampling guidance로 다룹니다. Design pipeline에서는 “prediction confidence” 다음에 “physical validity”를 별도 gate로 두는 것이 자연스럽습니다. Boltz-1x는 이 gate를 model inference 안으로 일부 끌어들인 사례입니다.

다만 physical validity gate를 통과한 구조도 실제 binder라는 뜻은 아닙니다. Pose가 plausible하고 chemistry가 깨지지 않았다는 것은 출발점입니다. Affinity, specificity, solubility, expression, cell-level function은 다른 evidence layer로 남습니다.

읽을 때의 균형점

Boltz-1x는 structure prediction update입니다. Binder generator나 affinity predictor로 task scope를 바꾸기보다, Boltz-1 output이 chemical sanity를 더 잘 지키도록 sampling을 조정합니다.

Physical validity는 좋은 downstream input의 조건입니다. PoseBusters-style check를 통과해도 affinity, specificity, solubility, expression, cell-level function은 별도 evidence layer로 남지만, chirality, stereochemistry, steric clash를 줄이는 것 자체가 design workflow에서는 중요한 개선입니다.

Boltz-steering은 inference-time guided sampling입니다. Base model이 모든 physical constraint를 internalized했다기보다, open predictor를 실제 molecular workflow에 넣기 위해 필요한 sanity layer를 붙인 업데이트로 보면 자연스럽습니다.

Boltz-2와 BoltzGen으로 넘어가기 전에

Boltz-1x를 중간에 읽는 이유는 Boltz-2/BoltzGen을 더 안전하게 해석하기 위해서입니다. Affinity prediction이나 binder generation은 결국 predicted complex structure를 input 또는 evaluation layer로 사용합니다. 이때 ligand geometry나 chain clash가 깨진 structure가 들어가면 affinity score나 design ranking도 흔들릴 수 있습니다.

따라서 Boltz-1x는 pipeline hygiene 관점에서 중요합니다. 좋은 generator보다 먼저 필요한 것은, 후보를 평가하는 predictor가 어떤 failure mode를 갖고 있고 그 failure를 어떻게 줄일 수 있는지 아는 것입니다.

평가: open predictor에 붙은 physical sanity layer

Boltz-1x의 가치는 성능 순위보다 design infrastructure 관점에서 더 잘 보입니다. Open AF3-style predictor가 구조를 내고 confidence를 주는 단계에서, 이제 physical validity까지 inference-time steering으로 다루기 시작했다는 점입니다.

Boltz-1이 open predictor를 열었다면, Boltz-1x는 그 predictor를 실제 molecular workflow에 넣을 때 생기는 chemical sanity 문제를 줄이려 합니다. 이 차이는 작아 보이지만 중요합니다. Benchmark geometry와 chemical validity 사이의 틈은 downstream design에서 바로 false positive와 false negative로 이어질 수 있습니다.

정리하면 Boltz-1x는 “더 좋은 binder model”이라기보다 “더 안전한 structure prediction output을 만들려는 open infrastructure update”입니다.

참고

- Paper/update: “Boltz-1: Democratizing Biomolecular Interaction Modeling” updated manuscript, Boltz-1x / Boltz-steering section - Authors: Jeremy Wohlwend, Gabriele Corso, Saro Passaro, Noah Getz, Mateo Reveiz, Ken Leidal, Wojtek Swiderski, Liam Atkinson, Tally Portnoi, Itamar Chinn, Jacob Silterra, Tommi Jaakkola, Regina Barzilay - bioRxiv DOI: https://doi.org/10.1101/2024.11.19.624167 - GitHub: https://github.com/jwohlwend/boltz - Raw source: `raw/papers/Boltz-1x/boltz-1x.pdf` - Extracted source: `raw/papers/Boltz-1x/extracted/boltz-1x.txt`