Cybersecurity

CVE-2026-27940: llama.cpp Integer Overflow Bypasses Earlier Heap Patch

Team Nippysoft

• Mar 12, 2026 • 10 min read

CVE-2026-27940: llama.cpp Integer Overflow Bypasses Earlier Heap Patch

A critical integer overflow vulnerability in llama.cpp, the widely adopted open-source LLM inference engine written in C/C++, has been publicly disclosed under CVE-2026-27940. The flaw resides in the GGUF file parser and allows attackers to craft malicious model files that trigger a heap buffer overflow, potentially enabling remote code execution on affected systems. Scored at CVSS 7.8 (High) with full impact on confidentiality, integrity, and availability, this vulnerability is especially significant because it bypasses CVE-2025-53630, a patch applied to the same function to address a similar integer overflow issue. The ggml-org maintainers have released the fix in build b8146. All versions prior to this build are affected. The vulnerability was reported by security researcher @adi0x90 and is classified under CWE-122 (Heap-based Buffer Overflow) and CWE-190 (Integer Overflow or Wraparound).

The Vulnerability and Its Origins

CVE-2026-27940 targets the gguf_init_from_file_impl() function in ggml/src/gguf.cpp, the core parser responsible for reading GGUF model files. GGUF is the binary format used across the llama.cpp ecosystem to distribute and load large language models. During initialization, the function iterates through tensor metadata to calculate the total buffer size needed to hold tensor data. That accumulated size is stored in ctx->size, a size_t variable.

The predecessor vulnerability, CVE-2025-53630, addressed an integer overflow in this same accumulation loop. The patch added checks to prevent ctx->size from overflowing when summing individual tensor sizes. However, this fix was incomplete. It did not protect the final mem_size calculation that occurs after the loop at line ~665, where ctx->size is combined with the tensor overhead to determine the total allocation size. This gap between the patched code path and the unprotected downstream calculation created the exact opening that CVE-2026-27940 exploits.

How the Integer Overflow Bypass Works

The attack constructs a GGUF file containing two I8 tensors, each with a dimension value (ne[0]) set to 0x7FFFFFFFFFFFFFC0. Each tensor individually passes the overflow check from the CVE-2025-53630 patch because the remaining capacity before SIZE_MAX is large enough for either one alone. Combined, however, the two values produce ctx->size = 0xFFFFFFFFFFFFFF80, which equals SIZE_MAX - 127.

The critical failure occurs at the final memory size calculation:

const size_t mem_size = (n_tensors + 1) * ggml_tensor_overhead() + ctx->size;

When the tensor overhead of 1,104 bytes is added to SIZE_MAX - 127, the result wraps modulo 2⁶⁴ to approximately 976 bytes. The memory allocator then reserves only 976 bytes instead of the exabyte-scale buffer the actual tensor data requires. This is the core of the vulnerability: a seemingly valid arithmetic operation that silently produces a catastrophically small allocation.

From Heap Corruption to Code Execution

With the undersized 976-byte buffer allocated, the data region available for tensor content is only 608 bytes. The subsequent fread() call attempts to read the attacker-supplied tensor data into this space. An attacker providing 1,136 or more bytes of controlled data writes at least 528 bytes beyond the allocated boundary, corrupting adjacent heap metadata and structures.

The security advisory demonstrates a full exploitation chain from initial crash through remote code execution. By tuning tensor dimensions to produce allocations within glibc tcache range (chunk size 992 bytes), the corrupted memory bypasses heap integrity checks during free(). The fast tcache path places the corrupted chunk silently onto the free list. Subsequent malloc() calls return attacker-pre-filled memory, enabling function pointer hijacking. Proof-of-concept exploits have achieved root shell access on macOS ARM64 and Ubuntu 24.04 (x86_64 and ARM64), demonstrating full weaponization of the flaw.

CVE-2025-53630 vs CVE-2026-27940

Attribute	CVE-2025-53630	CVE-2026-27940
Root Cause	Integer overflow in tensor size accumulation loop	Integer overflow in final mem_size calculation
Overflow Location	Line ~642, inside loop	Line ~665, after loop
CVSS Score	High	7.8 (High)
CWE	CWE-122, CWE-680	CWE-122, CWE-190
Exploitability	OOB Read/Write via offset miscalculation	Heap overflow via undersized allocation + fread
RCE Demonstrated	Not publicly	Yes, with tcache bypass
Fix	Overflow checks in accumulation loop	Overflow checks on final mem_size (build b8146)

This pattern of an incomplete fix being bypassed highlights a common security engineering mistake: patching only the identified vulnerable code path without auditing adjacent calculations that consume the same attacker-influenced data. A comprehensive fix requires verifying every arithmetic operation on untrusted input, not just the one initially reported.

Affected Components

The vulnerability triggers in any code path calling gguf_init_from_file() with no_alloc=false:

llama-quantize — model quantization tool (tools/quantize/quantize.cpp)
llama-imatrix — importance matrix computation (tools/imatrix/imatrix.cpp)
Control vector loading — via common/common.cpp
llama-gguf — GGUF file inspection utility (examples/gguf/gguf.cpp)

Critically, the primary model loading path used during inference operates with no_alloc=true and is not affected. Standard inference workloads are safe. However, any workflow involving model conversion, quantization, or inspection of untrusted GGUF files is at risk. Consider a developer downloading a GGUF model from a public repository and running llama-quantize on it — this routine operation becomes the attack vector in a supply chain scenario.

Implications for the AI Ecosystem

llama.cpp has become a foundational component of the open-source AI stack, powering local inference for projects ranging from individual developer tools to enterprise deployment platforms. Model files are routinely shared through repositories like Hugging Face, and the implicit trust placed in GGUF as a passive data container makes supply chain attacks through malicious models a realistic threat vector.

This vulnerability underscores the fragility of arithmetic safety in C/C++ codebases processing untrusted inputs at scale. The fact that a patched vulnerability was bypassed by targeting an unchecked code path just lines away from the original fix raises questions about the completeness of integer overflow mitigations across the broader ggml library and other native AI inference engines.

Remediation and Recommendations

The ggml-org team has addressed CVE-2026-27940 in build b8146. Teams running any prior version should upgrade immediately. Beyond the version update, the following defensive measures are recommended:

Verify model file provenance — Only load GGUF files from trusted sources. Implement hash verification for models downloaded from public repositories.
Sandbox model processing tools — Run quantization, conversion, and inspection utilities in isolated environments (containers, VMs) with minimal privileges.
Monitor for anomalous GGUF files — Files with unusually large tensor dimension values or mismatched declared sizes should trigger alerts in automated model pipelines.
Audit custom integrations — If your application calls gguf_init_from_file() directly with no_alloc=false, ensure you are using the patched version of ggml.

Full technical details and proof-of-concept materials are available in the GitHub Security Advisory GHSA-3p4r-fq3f-q74v.

CVE-2026-27940: llama.cpp Integer Overflow Bypasses Earlier Heap Patch

The Vulnerability and Its Origins

How the Integer Overflow Bypass Works

From Heap Corruption to Code Execution

CVE-2025-53630 vs CVE-2026-27940

Affected Components

Implications for the AI Ecosystem

Remediation and Recommendations

Tags

Share this post

Subscribe

Comments

Search

Categories

Recent Posts

CVE-2026-42945 (NGINX Rift): Critical Heap Buffer Overflow in the Rewrite Module

CVE-2026-40897: Arbitrary JavaScript Execution in Math.js Expression Parser

CVE-2026-5364: Unauthenticated File Upload Flaw in WordPress Contact Form 7 Plugin

CVE-2026-27940: llama.cpp Integer Overflow Bypasses Earlier Heap Patch

The Vulnerability and Its Origins

How the Integer Overflow Bypass Works

From Heap Corruption to Code Execution

CVE-2025-53630 vs CVE-2026-27940

Affected Components

Implications for the AI Ecosystem

Remediation and Recommendations

Tags

Share this post

Subscribe

Comments

Search

Categories

Recent Posts

CVE-2026-42945 (NGINX Rift): Critical Heap Buffer Overflow in the Rewrite Module

CVE-2026-40897: Arbitrary JavaScript Execution in Math.js Expression Parser

CVE-2026-5364: Unauthenticated File Upload Flaw in WordPress Contact Form 7 Plugin

Subscribed!

Error