Cybersecurity
CVE-2026-27940: llama.cpp Integer Overflow Bypasses Earlier Heap Patch
A critical integer overflow vulnerability in llama.cpp, the widely adopted open-source LLM inference engine written in C/C++, has been publicly disclosed under CVE-2026-27940. The flaw resides in the GGUF file parser and allows attackers to craft malicious model files that trigger a heap buffer overflow, potentially enabling remote code execution on affected systems. Scored at CVSS 7.8 (High) with full impact on confidentiality, integrity, and availability, this vulnerability is especially significant because it bypasses CVE-2025-53630, a patch applied to the same function to address a similar integer overflow issue. The ggml-org maintainers have released the fix in build b8146. All versions prior to this build are affected. The vulnerability was reported by security researcher @adi0x90 and is classified under CWE-122 (Heap-based Buffer Overflow) and CWE-190 (Integer Overflow or Wraparound).
The Vulnerability and Its Origins
CVE-2026-27940 targets the gguf_init_from_file_impl() function in ggml/src/gguf.cpp, the core parser responsible for reading GGUF model files. GGUF is the binary format used across the llama.cpp ecosystem to distribute and load large language models. During initialization, the function iterates through tensor metadata to calculate the total buffer size needed to hold tensor data. That accumulated size is stored in ctx->size, a size_t variable.
The predecessor vulnerability, CVE-2025-53630, addressed an integer overflow in this same accumulation loop. The patch added checks to prevent ctx->size from overflowing when summing individual tensor sizes. However, this fix was incomplete. It did not protect the final mem_size calculation that occurs after the loop at line ~665, where ctx->size is combined with the tensor overhead to determine the total allocation size. This gap between the patched code path and the unprotected downstream calculation created the exact opening that CVE-2026-27940 exploits.
How the Integer Overflow Bypass Works
The attack constructs a GGUF file containing two I8 tensors, each with a dimension value (ne[0]) set to 0x7FFFFFFFFFFFFFC0. Each tensor individually passes the overflow check from the CVE-2025-53630 patch because the remaining capacity before SIZE_MAX is large enough for either one alone. Combined, however, the two values produce ctx->size = 0xFFFFFFFFFFFFFF80, which equals SIZE_MAX - 127.
The critical failure occurs at the final memory size calculation:
const size_t mem_size = (n_tensors + 1) * ggml_tensor_overhead() + ctx->size;
When the tensor overhead of 1,104 bytes is added to SIZE_MAX - 127, the result wraps modulo 264 to approximately 976 bytes. The memory allocator then reserves only 976 bytes instead of the exabyte-scale buffer the actual tensor data requires. This is the core of the vulnerability: a seemingly valid arithmetic operation that silently produces a catastrophically small allocation.
From Heap Corruption to Code Execution
With the undersized 976-byte buffer allocated, the data region available for tensor content is only 608 bytes. The subsequent fread() call attempts to read the attacker-supplied tensor data into this space. An attacker providing 1,136 or more bytes of controlled data writes at least 528 bytes beyond the allocated boundary, corrupting adjacent heap metadata and structures.
The security advisory demonstrates a full exploitation chain from initial crash through remote code execution. By tuning tensor dimensions to produce allocations within glibc tcache range (chunk size 992 bytes), the corrupted memory bypasses heap integrity checks during free(). The fast tcache path places the corrupted chunk silently onto the free list. Subsequent malloc() calls return attacker-pre-filled memory, enabling function pointer hijacking. Proof-of-concept exploits have achieved root shell access on macOS ARM64 and Ubuntu 24.04 (x86_64 and ARM64), demonstrating full weaponization of the flaw.
CVE-2025-53630 vs CVE-2026-27940
| Attribute | CVE-2025-53630 | CVE-2026-27940 |
|---|---|---|
| Root Cause | Integer overflow in tensor size accumulation loop | Integer overflow in final mem_size calculation |
| Overflow Location | Line ~642, inside loop | Line ~665, after loop |
| CVSS Score | High | 7.8 (High) |
| CWE | CWE-122, CWE-680 | CWE-122, CWE-190 |
| Exploitability | OOB Read/Write via offset miscalculation | Heap overflow via undersized allocation + fread |
| RCE Demonstrated | Not publicly | Yes, with tcache bypass |
| Fix | Overflow checks in accumulation loop | Overflow checks on final mem_size (build b8146) |
This pattern of an incomplete fix being bypassed highlights a common security engineering mistake: patching only the identified vulnerable code path without auditing adjacent calculations that consume the same attacker-influenced data. A comprehensive fix requires verifying every arithmetic operation on untrusted input, not just the one initially reported.
Affected Components
The vulnerability triggers in any code path calling gguf_init_from_file() with no_alloc=false:
- llama-quantize — model quantization tool (tools/quantize/quantize.cpp)
- llama-imatrix — importance matrix computation (tools/imatrix/imatrix.cpp)
- Control vector loading — via common/common.cpp
- llama-gguf — GGUF file inspection utility (examples/gguf/gguf.cpp)
Critically, the primary model loading path used during inference operates with no_alloc=true and is not affected. Standard inference workloads are safe. However, any workflow involving model conversion, quantization, or inspection of untrusted GGUF files is at risk. Consider a developer downloading a GGUF model from a public repository and running llama-quantize on it — this routine operation becomes the attack vector in a supply chain scenario.
Implications for the AI Ecosystem
llama.cpp has become a foundational component of the open-source AI stack, powering local inference for projects ranging from individual developer tools to enterprise deployment platforms. Model files are routinely shared through repositories like Hugging Face, and the implicit trust placed in GGUF as a passive data container makes supply chain attacks through malicious models a realistic threat vector.
This vulnerability underscores the fragility of arithmetic safety in C/C++ codebases processing untrusted inputs at scale. The fact that a patched vulnerability was bypassed by targeting an unchecked code path just lines away from the original fix raises questions about the completeness of integer overflow mitigations across the broader ggml library and other native AI inference engines.
Remediation and Recommendations
The ggml-org team has addressed CVE-2026-27940 in build b8146. Teams running any prior version should upgrade immediately. Beyond the version update, the following defensive measures are recommended:
- Verify model file provenance — Only load GGUF files from trusted sources. Implement hash verification for models downloaded from public repositories.
- Sandbox model processing tools — Run quantization, conversion, and inspection utilities in isolated environments (containers, VMs) with minimal privileges.
- Monitor for anomalous GGUF files — Files with unusually large tensor dimension values or mismatched declared sizes should trigger alerts in automated model pipelines.
- Audit custom integrations — If your application calls
gguf_init_from_file()directly withno_alloc=false, ensure you are using the patched version of ggml.
Full technical details and proof-of-concept materials are available in the GitHub Security Advisory GHSA-3p4r-fq3f-q74v.
Tags
Share this post
Subscribe
Get the latest posts delivered right to your inbox.
Leave a comment