b9550
b9550
View on GitHubView PackagePublished: Jun 7, 2026

Release Notes

kv-cache: follow the source cache size when sharing cells (#24267)

A fitted target context can end up smaller than the draft default, the oversized assistant views then overflow the shared K/V tensors and trip the ggml_view_4d size assert during graph reserve.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: