b9820
b9820
View on GitHubView PackagePublished: Jun 26, 2026

Release Notes

sched : reintroduce less synchronizations during split compute (#20793)

  • CUDA: Improve performance via less synchronizations between token (#17795)

  • Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async()

  • Adds function to relax sync requirements between input copies on supported backends (CUDA for now)

  • Exchanges synchronous copy with async copy function.

  • Adds macro guards to allow compilation in non-CUDA builds

  • Reworked backend detection in ggml-backend.cpp to avoid linking conflicts

  • Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues

  • Minor cleanup

  • Makes opt-in to relax use of explicit syncs more general. Backends like vulkan which require a synchronization between HtoD copies and graph execution could also adopt this change now.

  • Reintroduces stricter check for CPU->CUDA backend async copy via GGML_DEVICE_TYPE_CPU.

  • Corrects initialization of ggml_backend_sync_mode in ggml_backend_sched_split initialization

  • Simplifies synchronizations to adhere to saaasg pattern.

  • Apply suggestion from @ggerganov (src->buffer to buf_src)

Co-authored-by: Georgi Gerganov [email protected]

  • Apply suggestion from @ggerganov (src->buffer to buf_src) v2

Co-authored-by: Georgi Gerganov [email protected]


Co-authored-by: Georgi Gerganov [email protected]

  • Apply suggestions from @johannesgaessler code review

Co-authored-by: Johannes Gäßler [email protected]

  • Adds single-GPU synchronizations to multi-GPU settings to fix hip backend pipeline parallel bugs.

  • Scheduler Hardening: Exclude hip/MUSA from copy_from_host CPU split -> GPU split optimization

  • Scheduler Hardening: Re-adding original additional synchronizations for non-async backends

  • Adds disclaimer to hip/musa exclusion of copy_from_host. Highlights that it is out of precaution, but that no perf-impact is visible, and that it can be revisited separately anytime.


Co-authored-by: Georgi Gerganov [email protected] Co-authored-by: Johannes Gäßler [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: