b9670
b9670
View on GitHubView PackagePublished: Jun 16, 2026

Release Notes

Fix and restrict NVFP4 edge-cases in llama-graph (#24331)

  • Move post-GEMM MUL required for dequant b4 lora and bias add

see https://github.com/ggml-org/llama.cpp/pull/23484 :

  1. For lora, I would presume we want fully dequantized values before doing the residuals, but this depends on how the LORAs were generated. Literature tells me LORA happens post-mul but pre-bias add https://github.com/ggml-org/llama.cpp/pull/8332
  2. For ModelOPT, bias-add should happen on fully-dequantized values
  • Restrict build_ffn for NVFP4 to supported combinations

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: