GraphNode
Back to all guides
SAST

Static Analysis for C and C++: Memory Safety, Buffer Overflows, and the Bugs That Won't Die

| 11 min read |GraphNode Research

On April 7, 2014, the OpenSSL project published an advisory for CVE-2014-0160, the bug the world came to know as Heartbleed. The defect lived inside a forty-line C function that handled the TLS heartbeat extension: the server trusted the length field in the incoming heartbeat packet without checking it against the actual payload length, then called memcpy to copy that many bytes from the request buffer into the response. An attacker who sent a one-byte payload but claimed sixty-five thousand bytes received sixty-five thousand bytes of process memory in return, drawn from whatever happened to live next to that buffer on the heap — private keys, session cookies, passwords still in flight. The patch was a single bounds check. The cleanup was global: every TLS-terminating server on the internet had to be patched, every certificate had to be rotated, and the long tail of embedded devices running old OpenSSL builds is still vulnerable today. Heartbleed is the textbook case for why C and C++ static analysis exists: the bug class is older than the language standard, the fix is mechanical, and the tool that catches it costs less than one breach.

C and C++ have powered operating systems, browsers, databases, embedded firmware, and the runtime layer of nearly every higher-level language for fifty years, and the ecosystem has accumulated the deepest catalog of memory-safety vulnerabilities in software history. This guide walks through the C/C++ vulnerability landscape memory safety static analysis is built to surface, shows vulnerable-to-fixed transformations the way real codebases ship them, and explains where C++ SAST earns its keep on legacy code that cannot be rewritten in Rust tomorrow. If you are evaluating C C++ static analysis tooling, the patterns below are the floor — any serious C++ static analysis engine should catch them on the diff that introduces them.

The C and C++ Vulnerability Landscape

Memory-safety bugs dominate every public CVE database for native code, and the same five classes appear year after year. Stack-based buffer overflows (CWE-121) and heap-based buffer overflows (CWE-122), grouped under the parent CWE-120 classic buffer copy without checking input size, account for the majority of remote code execution findings against C and C++ binaries. Use-after-free (CWE-416) and double-free (CWE-415) emerge from manual lifetime management across long-lived objects, asynchronous callbacks, and exception paths that nobody traced end-to-end. Null pointer dereference (CWE-476) crashes services and, on some platforms, escalates to exploitable state. Integer overflow (CWE-190) silently wraps a size calculation, and the next allocation or copy operates on a value the developer never imagined the variable could hold — CVE-2023-4863 in libwebp, the vulnerability that triggered emergency patches across Chrome, Firefox, Safari, and Electron in September 2023, was a heap buffer overflow in the VP8L Huffman decoder driven by exactly this shape.

Past the top five, the surface area is wide. Out-of-bounds reads (CWE-125) leak memory the way Heartbleed did and remain common wherever a length field arrives from the network. Format string vulnerabilities (CWE-134) survive in any code path that passes a user-controlled string to printf, fprintf, or syslog without a literal format specifier. Uninitialized memory reads (CWE-908) leak whatever the previous owner of that stack slot or heap chunk left behind. TOCTOU race conditions in filesystem code (CWE-367) let an attacker swap a path between the access check and the open call. Command injection through system, popen, and execl reaches the shell when the argument string is built from user input. The unsafe string family — strcpy, strcat, sprintf, gets, scanf("%s") — has been deprecated in compiler warnings for two decades and continues to ship because the bounded alternatives require a length argument the developer has to choose. CVE-2021-3711 in the OpenSSL SM2 decryption path was a heap buffer overflow driven by a buffer-size calculation that returned a value smaller than the actual output, and the remediation was a corrected size estimate plus a bounds check the original code had skipped.

Buffer Overflow: The Pattern That Built the Exploit Industry

The vulnerable shape is short enough to memorize, which is part of the reason it has survived since the Morris worm in 1988. A length-unaware copy reads from a source whose size the attacker controls, and the destination buffer overflows into adjacent stack frames, heap chunks, or function pointer tables:

// VULNERABLE
#include <string.h>
#include <stdio.h>

void log_username(const char *input) {
    char buffer[64];
    strcpy(buffer, input);            // no length check
    sprintf(buffer, "user=%s", input); // unbounded write
    printf("%s\n", buffer);
}

Pass an input longer than sixty-four bytes and strcpy walks past the end of buffer, overwriting the saved frame pointer and return address; on platforms without stack canaries or with a canary the attacker can bypass, the function returns into attacker-chosen code. The sprintf on the next line is the same bug with a different name. The fix replaces both calls with the bounded forms and validates the length at the boundary:

// FIXED
#include <string.h>
#include <stdio.h>

void log_username(const char *input) {
    char buffer[64];
    if (input == NULL || strnlen(input, 64) >= 64) {
        return;
    }
    snprintf(buffer, sizeof(buffer), "user=%s", input);
    printf("%s\n", buffer);
}

snprintf takes the destination size as its second argument and writes at most size - 1 bytes plus a terminating null; strnlen rejects inputs longer than the buffer before the copy ever runs. On platforms that ship strlcpy and strlcat (BSD, macOS, OpenBSD), prefer those over strncpy, which does not guarantee null termination. SAST data flow analysis traces the input pointer from the function parameter (or further, from a network read or a file parse) through the strcpy sink, and a bounded-write checker flags every call site where the destination size argument is missing, hardcoded incorrectly, or dependent on attacker-controlled arithmetic. See the A05 Security Misconfiguration guide for the operational neighbors of memory-safety bugs in deployment configuration.

Use-After-Free: Manual Lifetime Management at Scale

Use-after-free is the C and C++ vulnerability class that defeats the most reviewers, because the bug appears at the use site while the root cause sits in a free path that may be hundreds of lines away or in a different translation unit. The vulnerable shape in modern C++ is anything that hands out a raw pointer or reference whose lifetime is shorter than the consumer expects:

// VULNERABLE
#include <cstdlib>
#include <cstring>

struct Session {
    char *token;
};

Session* make_session(const char *raw) {
    Session *s = (Session*)malloc(sizeof(Session));
    s->token = strdup(raw);
    return s;
}

void use_session(Session *s) {
    free(s->token);
    if (s->token[0] == 'A') { /* use-after-free */ }
    free(s);
    free(s); /* double-free on retry path */
}

The read of s->token[0] after free dereferences a dangling pointer; on a heap with reuse pressure, the allocator has already handed that chunk to another caller, and the read returns whatever they wrote. The double-free is its own primitive: most modern allocators detect simple double-frees and abort, but a sufficiently-spaced double-free corrupts the freelist and yields write-what-where in the hands of a competent attacker. The fix moves lifetime management into the type system:

// FIXED
#include <memory>
#include <string>

struct Session {
    std::string token;
};

std::unique_ptr<Session> make_session(const std::string& raw) {
    auto s = std::make_unique<Session>();
    s->token = raw;
    return s;
}

void use_session(const Session& s) {
    if (!s.token.empty() && s.token[0] == 'A') { /* safe */ }
}

std::unique_ptr owns the allocation, the destructor runs exactly once at scope exit, and the std::string member manages its own buffer with no chance of dangling. The C++ Core Guidelines and the lifetime profile that ships with the GSL formalize this discipline; SAST engines built on Clang's -fanalyzer, the GCC static analyzer, or commercial inter-procedural data flow encode the lifetime rules as checkers and flag every use-after-free path the call graph exposes — including the ones that span a free in one translation unit and a use in another.

Detection: Where C and C++ SAST Earns Its Keep

C and C++ are, in many ways, the languages static analysis grew up on. Forty years of buffer overflows produced forty years of detection research, and the result is the deepest tooling tradition in the industry. The open-source baseline starts with cppcheck, which ships fast, low-noise checkers for the common bug classes and integrates into every major build system. The Clang Static Analyzer, exposed through scan-build and clang --analyze, performs symbolic execution across function bodies and catches null derefs, leaks, and uninitialized reads the type system cannot. The GCC static analyzer (-fanalyzer, mature since GCC 11) brings comparable inter-procedural reasoning to the GNU toolchain. Commercial engines — Coverity, GraphNode, Veracode, Fortify, PVS-Studio — add deeper data flow, broader rule coverage, and the policy frameworks that safety-critical industries require: MISRA C and MISRA C++ for automotive, AUTOSAR for the same domain post-2020, CERT C and CERT C++ for defense, and DO-178C for aerospace.

A modern memory safety static analysis engine builds a call graph across the compiled translation units, identifies sources (network reads, file parses, IPC boundaries, command-line arguments), tracks the values forward through pointer arithmetic and length calculations, and flags any path that reaches an unbounded sink — strcpy, memcpy with attacker-controlled length, sprintf without size, system with concatenated input — without traversing a validator. The same engine catches the lifetime bugs (use-after-free, double-free, null deref) that the lifetime profile encodes, and the configuration patterns (missing -D_FORTIFY_SOURCE=2, missing stack canaries, missing PIE) that harden the binary post-compile. Rust has emerged as the modern memory-safe alternative, and new systems code should default to it when the choice is open; for the hundreds of millions of lines of C and C++ that already ship in kernels, browsers, embedded firmware, and the runtime of every higher-level language, SAST is the only mechanical defense that scales to the codebase as it stands.

Prevention Checklist for C and C++ Codebases

Six rules close the overwhelming majority of real-world C and C++ vulnerabilities. They assume the team has already wired SAST into the pull-request gate; without that, even the strongest checklist degrades to a coding standard nobody re-reads.

  • Replace unbounded string functions everywhere. Forbid strcpy, strcat, sprintf, gets, and scanf("%s") at code review and gate them in CI; replace with snprintf, strlcpy/strlcat on platforms that ship them, or std::string in C++ code that can use it.
  • Validate every length before every copy. Any memcpy, memmove, or manual loop whose count comes from network bytes, file headers, or untrusted callers must check the count against the destination size before the copy; Heartbleed was one missing check.
  • Move lifetime management into the type system. In C++, default to std::unique_ptr, std::shared_ptr, and RAII containers; reserve raw new/delete and malloc/free for the boundaries with C APIs. The C++ Core Guidelines lifetime profile is the authoritative reference.
  • Treat integer arithmetic on sizes as untrusted. Use size_t consistently, check for overflow on multiplications that feed allocations (__builtin_mul_overflow in GCC and Clang, SafeInt for cross-platform code), and reject negative values before passing them to APIs that take unsigned counts.
  • Pass a literal format string to every printf-family call. Never let user input flow into the first argument of printf, fprintf, syslog, or any wrapper. Compiler warnings (-Wformat-security, -Werror=format-security) catch the basic cases; SAST catches the indirect ones.
  • Compile with the hardening flags the platform offers. -D_FORTIFY_SOURCE=2, -fstack-protector-strong, -fPIE -pie, -Wl,-z,relro,-z,now, and -fsanitize=address,undefined in CI catch what static analysis misses at runtime. They are not a substitute for the source fix, but they are the safety net for the source fix you have not made yet.

Where GraphNode SAST Fits

GraphNode SAST ships native C and C++ support as first-class languages alongside eleven others, with deep inter-procedural data flow on the patterns this guide describes — tainted length values reaching memcpy, unbounded copies into fixed-size buffers, use-after-free and double-free across translation units, format string sinks with non-literal first arguments, integer overflow feeding allocation size, and the operational hardening flags missing from the build. CI integration gates the pull request before the build leaves the developer's branch, on the diff that introduced the bug. For a broader landscape view, the SAST Tools Buyer's Guide compares ten platforms.

Closing

C and C++ vulnerability classes are old, exhaustively documented, and individually preventable with a one-line fix. The reason they continue to ship is structural: the languages give the developer manual control over memory and trust them to use it correctly, the dangerous APIs are ergonomic enough that nobody pauses on them, and the safe alternatives require a length argument, a smart pointer, or a wholesale rewrite the schedule does not allow. Static analysis works on C and C++ because forty years of research have built the call graph reasoning, the lifetime checkers, and the bounded-write rules that catch the classes the language cannot prevent. Rust closes the door on new systems code; SAST is what holds the line on the hundreds of millions of lines already in production. The teams that stop shipping Heartbleed-shaped bugs are the ones that move detection upstream into the diff and treat the SAST finding as a blocker rather than a backlog ticket.

GraphNode SAST traces memory-safety bugs through C and C++ codebases on the diff that introduced them — request a demo.

Request Demo