ggml-org/llama.cpp b9670 | My Release Notes

b9670

b9670

View on GitHub View PackagePublished: Jun 16, 2026

Release Notes

Fix and restrict NVFP4 edge-cases in llama-graph (#24331)

Move post-GEMM MUL required for dequant b4 lora and bias add

see https://github.com/ggml-org/llama.cpp/pull/23484 :

For lora, I would presume we want fully dequantized values before doing the residuals, but this depends on how the LORAs were generated. Literature tells me LORA happens post-mul but pre-bias add https://github.com/ggml-org/llama.cpp/pull/8332
For ModelOPT, bias-add should happen on fully-dequantized values

Restrict build_ffn for NVFP4 to supported combinations

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI