From a catalogue of 105 documented, observable failure modes across 9 AI video models — not an invented pass-rate score. It shows where each model is known to break so you can pick by track record.

By attribute · 9 models · 105 documented failure modes

Which AI video model keeps a consistent face across cuts?

Runway Gen-4 Scenes mode is the only documented model built specifically to hold a character across multiple cuts; others drift after a few cuts.

Last updated June 16, 2026 · Methodology: documented-failure-mode catalogue, not invented scores

Short answer

Runway Gen-4 (Scenes mode) is the only documented AI video model built to hold the same character across multiple cuts; other models visibly drift after a few cuts. For multi-shot work with a recurring face it has the strongest documented identity track record — though no model is reliable on extreme close-ups.

Identity holding is the consistency most people actually mean when they ask which model is “consistent.” Runway’s Scenes mode references a locked character, which is why its documented identity failures are narrower than its peers. The others can match a face within a single take but drift across cuts because each clip is a fresh sample. If your project is one continuous shot, the gap narrows; if it spans cuts with the same person, Runway is the documented pick.

See the documented evidence: Runway Gen-4 profile, the full failure catalogue, or the overall consistency ranking.

Full context

Documented failure profile, every model

Model	Documented modes	Holds best on	Documented weak spot
VeoGoogle Veo 3	13	native audio, single-shot photoreal, lighting	long-prompt instruction drop, camera-motion-ignored on locked-off shots
RunwayRunway Gen-4	13	character identity across cuts (Scenes mode)	hand anatomy on close-ups, prompt-ignored on dense prompts
SoraOpenAI Sora 2	12	stylized motion (historically)	camera-control failures, multi-character interaction
SeedanceByteDance Seedance	12	short stylized clips	style-preset drift, motion drift over long clips
LumaLuma Dream Machine Ray-2	12	lighting realism, atmospheric single takes	identity drift past ~3 cuts, camera-path drift
ViduVidu	11	reference-to-video character carry	motion plausibility, color drift
PikaPika 2.0	11	stylized short-form, the closest Sora-style substitute	face distortion on long clips, motion failures
KlingKling 1.6	11	human motion on simple single-subject shots	motion-blur overload, prompt adherence on complex scenes
HailuoHailuo MiniMax	10	expressive faces on close-ups	camera-shake artifacts, physics collapse

Which model holds…

Pick by the thing that has to stay consistent

Holds readable on-screen text

No model reliably does

Text rendering is a documented failure mode for every covered model — all garble past roughly six characters. Add text in post instead of relying on the model.

Holds correct hands in close-up

No model reliably does

Hand-anatomy failure is documented across every model. Frame hands away from camera or expect to re-roll; no model has solved close-up finger topology.

Holds cinematic lighting in a single take

Luma Ray-2

Luma Ray-2 documents the fewest lighting-related failures and leads on photoreal cinematic light for mood-led single-shot work.

Holds native audio with the video

Veo (Google Veo 3)

Veo is the only covered model with native audio generation; the rest produce silent video that needs separate audio.

Holds long, multi-instruction prompts

No model reliably does

Every model documents an instruction-drop / prompt-adherence failure that worsens as prompt length grows. Front-load must-haves and keep prompts short.

Documented failure profile, every model

Pick by the thing that has to stay consistent

Score your prompt against each model’s documented weak spots.