VU#915947: SGLang is vulnerable to remote code execution when rendering chat templates from a model file

VU#915947: SGLang is vulnerable to remote code execution when rendering chat templates from a model file

Overview

A remote code execution vulnerability has been discovered in the SGLang project, specifically in the reranking endpoint (/v1/rerank). A CVE has been assigned to track the vulnerability; CVE-2026-5760. An attacker can create a malicious model for SGLang to achieve RCE. Successful exploitation could allow arbitrary code execution in the context of the SGLang service, potentially leading to host compromise, lateral movement, data exfiltration, or denial-of-service (DoS) attacks. No response was obtained from the project maintainers during coordination.

Description

SGLang is an open-source framework for serving large language models (LLMs) and multimodal AI models, supporting models such as Qwen, DeepSeek, Mistral, and Skywork, and is compatible with OpenAI APIs. A vulnerability, tracked as CVE-2026-5760, has been discovered within the reranking endpoints. Using a cross-encoder model, the reranking endpoint reranks documents based on their relevance to a query.

An attacker exploits this vulnerability by creating a malicious GPT Generated Unified Format (GGUF) model file with a crafted tokenizer.chat_template parameter that contains a Jinja2 server-side template injection (SSTI) payload with a trigger phrase to activate the vulnerable code path. A tokenizer.chat_template is a metadata field that defines how text is structured before being processed. The victim then downloads and loads the model in SGLang, and when a request hits the /v1/rerank endpoint, the malicious template is rendered, executing the attacker’s arbitrary Python code on the server. This sequence of events enables the attacker to achieve remote code execution (RCE) on the SGLang server.

The vulnerability arises from the use of jinja2.Environment() without sandboxing in the getjinjaenv() function. This function sets up the environment for rendering Jinja2 templates, but since it lacks proper sandboxing, it fails to restrict the execution of arbitrary Python code. Consequently, when the reranking endpoint is accessed and a malicious model file containing a crafted tokenizer.chattemplate is loaded, the model can execute arbitrary commands on the server.

Impact

An attacker can create a malicious model for SGLang to achieve RCE. Successful exploitation could allow arbitrary code execution in the context of the SGLang service, potentially leading to host compromise, lateral movement, data exfiltration, or denial-of-service (DoS) attacks. Deployments that expose the affected interface to untrusted networks are at the highest risk of exploitation.

Solution

To mitigate this vulnerability, it is recommended to use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates. This will prevent the execution of arbitrary Python code on the server. No response or patch was obtained during the coordination process.

Acknowledgements

Thanks to the reporter, Stuart Beck. This document was written by Christopher Cullen.