v5.12.0
Release v5.12.0
View on GitHubView PackagePublished: Jun 12, 2026

Release Notes

Release v5.12.0

New Model additions

MiniMax-M3-VL

image

MiniMax-M3-VL is the vision-language member of the MiniMax-M3 family that pairs a CLIP-style vision tower with 3D rotary position embeddings with the MiniMax-M3 text backbone. It uses a mixed dense/sparse Mixture-of-Experts decoder with SwiGLU-OAI gated experts and a lightning indexer for block-sparse attention. The model processes images through a Conv3d patch embedding system and includes specialized components for efficient multimodal understanding and generation.

Links: Documentation

  • Add minimax m3vl (#46600) by @ArthurZucker in #46600

PP-OCRv6: update documentation and slow tests (#46576)

image

The official weights for PP-OCRv6 are out: PP-OCRv6 is a lightweight OCR system that combines architectural innovation with data-centric optimization. It redesigns the backbone, detection neck, and recognition neck around a unified MetaFormer-style building block with structural reparameterization. Three model tiers (medium, small, tiny) share the same block primitives, covering deployment scenarios from server to edge.

  • PP-OCRv6: update documentation and slow tests (#46576) by @ zhang-prog

Add Parakeet-RNNT (#46331)

ParakeetForRNNT: a Fast Conformer Encoder + an RNN-T (RNN Transducer) decoder

  • RNN-T Decoder: Standard neural transducer:
    • LSTM prediction network maintains language context across token predictions.
      • Joint network combines encoder and decoder outputs.
      • Greedy transducer decoding for inference: a blank emission advances the encoder frame by one, a non-blank emission stays on the same frame.
  • Add Parakeet-RNNT (#46331) by @eustlb

Bugfixes and improvements

  • [CI] don't export OTELs within the tests (#46602) by @tarekziade in [#46602]
  • [CI] capture checkers output in OTEL (#46601) by @tarekziade in [#46601]
  • Lfm2: thread seq_idx through ShortConv for packed/varlen inputs (#46588) by @ChangyiYang in [#46588]
  • put output_hidden_states into filter_output_hidden_states (#46422) by @molbap in [#46422]
  • a11 for checkers (#46599) by @tarekziade in [#46599]
  • Fix stop string matching for byte-fragment tokens (#46530) by @Incheonkirin in [#46530]
  • [DiffusionGemma] better docs and links (#46569) by @gante in [#46569]
  • Require trust_remote_code to run a local-directory custom_generate (#46483) by @LinZiyuu in [#46483]
  • Fix torchaudio version not tied to torch version in docker file (#46594) by @ydshieh in [#46594]
  • [CI] Enable PR CI for all fork PRs via security gate (#46591) by @ydshieh in [#46591]
  • [CB] [Minor] Add parameter to tune default compile level (#46533) by @remi-or in [#46533]
  • Make DiffusionGemma trainable (#46568) by @kashif in [#46568]
  • docs: 🌐 add Turkish translation for README file (#46312) by @onuralpszr in [#46312]
  • fix-trainer-tests (#46541) by @SunMarc in [#46541]
  • Remove unnecessary expand_as in get_placeholder_mask across VLMs (#44907) by @syncdoth in [#44907]
  • [CI] Catch all shell/process execution issues in security gate via Bandit JSON report (#46560) by @ydshieh in [#46560]
  • Honor a concrete dtype in AutoModel for composite checkpoints (#46514) by @qflen in [#46514]
  • [CI] Implement real security check in PR CI security gate (#46557) by @ydshieh in [#46557]
  • [CI] Add 60s delay in security gate for flow observation (#46555) by @ydshieh in [#46555]
  • [TBC] [CI] Auto-approve PR CI for fork PRs via security gate (#46553) by @ydshieh in [#46553]
  • [CI] fix and make less flaky (#46543) by @zucchini-nlp in [#46543]
  • Fix hf_hub_download not placing file in current dir for url_to_local_path (#46545) by @ydshieh in [#46545]

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ArthurZucker
    • Add minimax m3vl (#46600)
  • @eustlb
    • Add Parakeet-RNNT (#46331)