b9558
b9558
View on GitHubView PackagePublished: Jun 8, 2026

Release Notes

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)

This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup.

In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are multiples of 4.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: