觀點

Code Review 的問題，可能比多數團隊想像得更早開始改變了

AI-assisted development 讓產生 code 變便宜，code review 的瓶頸正在轉向脈絡重建與 reviewer-oriented context engineering。

2026-05-11

幾年前，軟體工程最大的瓶頸通常還是 implementation。

但現在，在 AI-assisted development 開始普及之後，真正的瓶頸其實正在悄悄轉移：

已經不是「寫出程式碼」本身，而是「理解到底改了什麼」。

這件事情，我是在 code review 時非常明顯地感受到的。

不是因為 code 特別差。

也不是因為 agent 特別不可靠。

而是當你開始在很多不同 repo 之間 review PR 時，你會發現整個 review 體驗開始變得本質上不一樣。

真正困難的部分，不再是：

syntax、
implementation detail、
甚至不是 business logic。

而是：

context reconstruction（脈絡重建）。

而且 AI 越普及，這個問題只會越明顯。

那個開始感覺「不對勁」的瞬間

有一天我在短時間內 review 幾個不同 repository 的 PR。

單看每個 PR，其實都不差：

結構乾淨、
checks 都過、
naming 合理、
邏輯看起來也算一致。

甚至很多 AI 生成的 code，表面上還比過去的人寫 code 更「整齊」。

但 review 幾個 repo 後，我開始注意到一件事：

我對於「PR 真正的 scope」開始失去掌握感。

不是因為 diff 很大。

而是因為每切一次 repo，都要重新 rebuild 一次 mental model：

architecture assumptions
runtime constraints
historical conventions
deployment implications
ownership boundaries
tracing behavior
SSR/client separation
feature toggle patterns
release safety assumptions

這時候我才發現：

真正昂貴的，已經不是 code 本身。

而是：

能不能在有限時間內，把 operational context 重建到足以安全 review 的程度。

而最近很多 AI-assisted development 的討論，其實也都開始指向同一件事：

AI 提升了 implementation velocity，但 human review capacity 並沒有同步提升。

AI 改變了工程的成本結構

以前 code review 能運作，很大一部分是因為 implementation 本身很貴。

人類寫 code 沒那麼快。

這會自然限制：

PR 數量、
PR scope、
architecture spread。

reviewer 也因此能慢慢累積 repo familiarity。

但 AI-assisted development 改變了整個 equation。

現在：

implementation 變便宜了
scaffolding 幾乎免費
boilerplate 幾乎 instant
iteration cost 大幅下降

但 review attention 並沒有一起 scale。

人的 cognitive bandwidth 幾乎沒有變。

於是會出現一個很奇怪的 imbalance：

Resource	過去	現在
寫 code	高成本	低成本
產生 PR	中成本	極低成本
Context switching	中成本	高成本
Review attention	昂貴	依然昂貴
Architectural reasoning	昂貴	依然昂貴

Anthropic 最近甚至直接提到：

「code review 已經開始成為 bottleneck。」

AI Code 最大的危險，其實是「看起來很合理」

AI-generated code 有一個很危險的特性：

它通常「長得像正確答案」。

結構合理
naming 正常
pattern 看起來一致
formatting 很漂亮

這會產生一種心理上的 illusion：

reviewer 更容易開始「skim review」。

因為視覺上，code 看起來「沒什麼問題」。

但真正困難的是 operational correctness。

尤其在 multi-repo organization 裡面。

真正該問的問題其實是：

這有沒有違反 system assumption？
這會不會破壞 tracing consistency？
SSR 行為會不會 subtly break？
ownership boundary 有沒有被 bypass？
rollout risk 是什麼？
是否與 historical architecture decision 衝突？

這些都不是 syntax-level 問題。

而是：

context reconstruction 問題。

最近一些研究甚至指出：

context 過多不一定提升 review quality，反而可能造成 reviewer attention dilution。

這點其實跟我自己的感受非常一致。

現在真正的 bottleneck 已經不是 code 本身

後來我開始意識到一件事情：

現代 code review，本質上已經不是在 review code。

而是：

在有限認知預算下，重建足夠的 system understanding 去驗證 change intent。

這兩件事差非常多。

因為傳統 code review 流程有個隱含假設：

reviewer 可以從 diff 自己 reconstruct intent。

但在 AI-assisted environment 裡，這個 assumption 正在快速失效。

尤其當：

repo 很大、
ownership fragmented、
implementation throughput 大幅提升、
PR 數量暴增時。

為什麼 Multi-Repo Review 特別容易崩潰

單 repo familiarity 還勉強成立。

但 multi-repo organization 會把問題放大很多。

因為每個 repo 都有自己的 hidden context：

deployment workflow
runtime environment
observability assumptions
CI rules
architecture history
business sensitivity
operational risk

每切一次 repo，就像做一次 engineering cache invalidation。

reviewer 必須重新 reload：

terminology
patterns
dependency graph
risk model

最後會開始出現一種很危險的狀態：

PR 還是在 review。

approval 還是在過。

pipeline 還是 green。

但實際 architectural verification depth 已經開始下降。

而且這件事表面上通常看不出來。

很多 Automation，其實只解決了錯的 Layer

很多團隊會開始增加：

lint
static analysis
AI reviewer
security scanner
PR summarizer

這些都很有用。

但大部分其實是在優化：

mechanical validation。

真正困難的問題依然是：

人類能不能足夠快速 reconstruct operational intent？

這是完全不同層級的問題。

而現在很多 modern review guideline，其實也開始強調：

smaller bounded PRs
stronger review metadata
explicit architectural scope

因為 reviewer cognition 已經成為 scarce resource。

下一個真正重要的工程能力：Context Engineering

我越來越覺得：

下一個重要的 engineering discipline，可能不是 prompt engineering。

而是：

reviewer-oriented context engineering。

也就是：

如何壓縮 architecture understanding
如何降低 cognitive warmup cost
如何讓 PR intent 可快速 reconstruct
如何降低 ambiguity
如何明確 surface risk
如何在高 implementation velocity 下維持 system coherence

換句話說：

PR 本身，正在從「code diff」演變成「structured review packet」。

PR 可能需要進化成 Structured Review Packet

過去的 PR：

title
description
diff
comments

可能已經不太夠了。

未來的 PR 很可能需要：

## Why this exists
## User impact
## Systems affected
## Runtime risks
## Rollback strategy
## Testing evidence
## Architectural considerations
## AI-generated scope

不是因為工程師變差了。

而是因為：

human attention 已經變成最稀缺資源。

而現在很多 modern code review best practice，其實也開始往這個方向靠攏。

我現在越來越相信的 Layered Review Model

Layer 1 - Machines

機器負責：

formatting
typing
lint
dependency checks
contract validation
tests
policy enforcement

人類不應該浪費 cognition 在這裡。

Layer 2 - AI Review Agents

Agent 協助：

duplicated logic
suspicious patterns
missing edge cases
architectural anomaly
consistency violation

但 AI review 應該是：

降低 cognitive load。

而不是取代 accountability。

目前 frontier model 的 review capability，其實離真正 architectural review 還有距離。

Layer 3 - Human Architectural Review

人類真正該 focus 的應該是：

system coherence
operational safety
release implication
business correctness
maintainability
architectural integrity

真正 high-value 的 decision。

最後一個很重要的觀察

我覺得整個 industry 正在慢慢發現一件事情：

AI 並沒有消滅 engineering complexity。

它只是把 complexity 重新搬移了。

從：

implementation effort

轉移到：

coordination
review
consistency
observability
architectural clarity

而 code review，是這個轉變最早開始出現症狀的地方。

Final Thought

未來最有價值的工程師，可能不再只是寫 code 最快的人。

而是能夠：

維持 system coherence
維持 review quality
高效壓縮 context
幫助組織 scale engineering understanding

的人。

因為：

當 code generation 足夠便宜之後，consistency 才是真正昂貴的東西。

References

Anthropic 對 code review bottleneck 的討論 Anthropic says code review has become a bottleneck
AI-assisted code review 與 reviewer overload 研究 LLM-Based Code Review Research (arXiv)
Modern scalable code review practices Code Review Best Practices That Scale
AI-assisted review pitfalls discussion AI-Assisted Code Review: Opportunities and Pitfalls
General engineering review workflow guidance Best Code Review Practices