b9509
b9509
View on GitHubView PackagePublished: Jun 4, 2026

Release Notes

server: avoid unnecessary checkpoint restore when new tokens are present (#24110)

  • server: avoid unnecessary checkpoint restore when new tokens are present

The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for logits when no new tokens exist. However, when the request contains new tokens beyond the cached prefix, this -1 is overly conservative and may trigger an unnecessary checkpoint restore.

Conditionally apply the -1 only when n_past >= task.n_tokens() (no new tokens), avoiding redundant KV state restoration when there is actual work to do.

  • cont : add ref

Co-authored-by: Georgi Gerganov [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: