You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance Jinja2ChatFormatter to better support HuggingFace-style chat
templates while keeping the formatter lightweight and aligned with
llama-cpp-python's prompt-rendering needs.
This change adds a custom Jinja extension for `{% generation %}` blocks.
HuggingFace Transformers uses this tag to track assistant-token spans for
assistant masks, but llama-cpp-python only needs the final rendered prompt.
The new IgnoreGenerationTags extension therefore treats the tag as a
transparent wrapper: it removes the generation/endgeneration tag pair while
rendering the inner template body normally. This allows templates that contain
`{% generation %}` blocks to render successfully without introducing span
tracking overhead.
The Jinja environment is also expanded to more closely match Transformers'
chat-template runtime behavior. It now enables `jinja2.ext.loopcontrols` for
templates that use `{% break %}` or `{% continue %}`, registers a plain JSON
`tojson` filter that avoids Jinja's HTML escaping behavior, and exposes
`raise_exception` and `strftime_now` as globals instead of passing them on every
render call.
The formatter now accepts an optional `special_tokens_map`, making additional
tokenizer special tokens available to templates. This improves compatibility
with templates that reference variables such as `pad_token`, `unk_token`,
`sep_token`, or model-specific special tokens beyond `bos_token` and
`eos_token`.
This also adds optional `documents` support to `__call__`, allowing RAG-style
or document-aware chat templates to receive a `documents` variable in the
render context.
Finally, static stop fields are precomputed during initialization. Text stop
sequences and token-id stopping criteria are now built once instead of being
recreated for every chat formatting call. The token-id stopping callback also
guards against empty token arrays before reading the last token.
Key changes:
- Add IgnoreGenerationTags Jinja extension for HF `{% generation %}` blocks.
- Enable Jinja loop controls for chat templates using break/continue.
- Register Transformers-compatible `tojson` behavior.
- Register `raise_exception` and `strftime_now` as Jinja globals.
- Add `special_tokens_map` support for additional template variables.
- Add optional `documents` argument for document-aware templates.
- Precompute text stop sequences and token-id stopping criteria.
- Improve type normalization for `stop_token_ids`.
- Expand docstrings for formatter initialization and render-time variables.
Signed-off-by: JamePeng <jame_peng@sina.com>
0 commit comments