ggml-org/llama.cpp b9479 | My Release Notes

b9479

View on GitHub View PackagePublished: Jun 2, 2026

Release Notes

common : fix state save in common_prompt_batch_decode (#23468)

common : fix state save in common_prompt_batch_decode

This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cpp.

The motivation for this is that currently the code is saving n-1 tokens in both the session_tokens and in the KV cache. Then when loading the session tokens, and if the prompt matches, it would replay the last saved token (n-1) into the next position, effectively replaying the same token in the wrong position.

The fix is to store all n tokens in session_tokens, while the memory state only reflects n-1 processed tokens as the saving happens before the last token is decoded in common_prompt_batch_decode.

I ran both completion.cpp and save-load-state.cpp with a transformer, a recurrent, and a hybrid model.

Resolves: https://github.com/ggml-org/llama.cpp/issues/23400

Co-authored-by: fairydreaming [email protected]

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI: