ggml-org/llama.cpp b9626 | My Release Notes

b9626

Release Notes

Add arch support for cohere2-MoE (#24260)

Add arch support for cohere2-MoE
Removed redundant gating_func checks
Changed ffn lookup to prefer prefix_dense_intermediate_size
Renamed arch to cohere2moe
Removed redundant lmhead check and chat template changes
Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight
Changed to (routed+shared)*0.5 for shared expert combined avg
fixed sliding_window_pattern issue and pattern
Fixed transformers crash 'first_k_dense_replace' error
Remove comment
Removed cohere2-moe as a tokenizer type and kept as tiny_aya. Renamed North-Mini-Code-1.0.
Fixed MTP fail, changed to use iSWA
Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use
Force metadata usage

Co-authored-by: Sigbjørn Skjæret [email protected]

Co-authored-by: Sigbjørn Skjæret [email protected]

Co-authored-by: Sigbjørn Skjæret [email protected]

Co-authored-by: Sigbjørn Skjæret [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

UI: