GraphNode
Back to all posts
AppSec

Path Traversal Attack Explained: Why CWE-22 Still Ships in 2026

| 11 min read |GraphNode Research

On January 24, 2024, Jenkins published an advisory for CVE-2024-23897: an arbitrary file read vulnerability in the built-in CLI command parser, scored CVSS 9.8. The bug was a single overlooked feature. The Jenkins CLI accepts arguments prefixed with the @ character and silently expands them by reading the file at that path on the controller, then substituting its contents back into the command line. An unauthenticated attacker who could reach the CLI endpoint could request the contents of arbitrary files on the Jenkins server. Within forty-eight hours, public proof-of-concept exploits were lifting /etc/passwd, controller SSH private keys, and the master encryption key Jenkins uses to wrap stored credentials. Servers with the affected versions began appearing in cryptomining campaigns and ransomware staging within the week.

There was no buffer overflow, no exotic deserialization gadget, no race condition. A user-supplied string was concatenated into a filesystem path, and the filesystem returned what the path pointed to. That is the entire pattern of CWE-22, and twenty-six years after its first formal description, it is still shipping in production every quarter. This article walks through how path traversal works in practice, what fixed code actually looks like in two languages, why mature engineering teams still introduce the bug, and how to detect it before it ships.

What CWE-22 Actually Is

The mechanic is unglamorous: an application accepts user-controlled input, joins it onto a base directory, and passes the result to a filesystem API without verifying that the resolved path remains inside the intended base. The classic payload is ../ (or ..\ on Windows), which instructs the operating system to walk one level up the directory tree. Repeated, it walks all the way to the root: ../../../../../etc/passwd. Naive defenses that strip the literal string ../ from input are routinely bypassed by URL-encoded variants (%2e%2e%2f), double-encoded variants (%252e%252e%252f), backslash variants on Windows, and Unicode normalization tricks. On older C and PHP applications, null byte injection (file.txt%00.jpg) used to terminate the string at the kernel layer while passing length-based extension checks; modern runtimes have largely closed that avenue, but the underlying mistake — trusting input to stay inside a directory — is exactly the same.

The same class also covers absolute path injection. If your code does path.join("/var/uploads", filename) and filename happens to be /etc/shadow, most language standard libraries will discard the base and use the absolute path instead. Validating only for .. sequences while ignoring leading slashes is a common partial fix that misses half the attack surface.

The Vulnerable Code You Have Probably Written

Here is a Node.js Express handler that serves user-uploaded files. It looks reasonable at a glance, ships in countless internal admin tools, and is broken.

// VULNERABLE
const express = require('express');
const fs = require('fs');
const path = require('path');

const app = express();
const UPLOADS_DIR = '/var/app/uploads';

app.get('/files/:filename', (req, res) => {
  const target = path.join(UPLOADS_DIR, req.params.filename);
  fs.readFile(target, (err, data) => {
    if (err) return res.status(404).send('not found');
    res.send(data);
  });
});

A request to /files/..%2F..%2F..%2Fetc%2Fpasswd resolves to /etc/passwd on a Linux host. path.join normalizes the .. segments but does nothing to keep the result inside UPLOADS_DIR. The fix is to canonicalize the resolved path and then verify it is a descendant of the intended root, after canonicalization, not before.

// FIXED
const express = require('express');
const fs = require('fs');
const path = require('path');

const app = express();
const UPLOADS_DIR = path.resolve('/var/app/uploads');

app.get('/files/:filename', (req, res) => {
  const requested = path.resolve(UPLOADS_DIR, req.params.filename);
  if (!requested.startsWith(UPLOADS_DIR + path.sep)) {
    return res.status(400).send('invalid path');
  }
  fs.readFile(requested, (err, data) => {
    if (err) return res.status(404).send('not found');
    res.send(data);
  });
});

The boundary check happens after path.resolve has flattened any .. segments and absolutized the result. The + path.sep on the prefix is load-bearing: without it, a request for a sibling directory like /var/app/uploads-secret would slip through a naive startsWith. Symlinks inside the uploads directory remain a separate concern; if attackers can create them, you need to follow up with fs.realpath and re-check.

Python developers reach for Flask's send_from_directory believing it handles this. It mostly does, but it is regularly defeated by upstream code that forgot to use it. The pattern below, which appears in tutorials and Stack Overflow answers, is the broken version.

# VULNERABLE
from flask import Flask, send_file, request
import os

app = Flask(__name__)
REPORTS_DIR = '/srv/reports'

@app.route('/report')
def report():
    name = request.args.get('name', '')
    return send_file(os.path.join(REPORTS_DIR, name))
# FIXED
from flask import Flask, send_from_directory, abort, request
from pathlib import Path

app = Flask(__name__)
REPORTS_DIR = Path('/srv/reports').resolve()

@app.route('/report')
def report():
    name = request.args.get('name', '')
    candidate = (REPORTS_DIR / name).resolve()
    try:
        candidate.relative_to(REPORTS_DIR)
    except ValueError:
        abort(400)
    if not candidate.is_file():
        abort(404)
    return send_from_directory(REPORTS_DIR, candidate.name)

Path.resolve() handles the canonicalization. relative_to raises if the resolved path escapes the root, which is the whole boundary check expressed as a single exception. Passing only candidate.name back to send_from_directory strips any directory components that survived, so the file is served by basename from the trusted root rather than by attacker-influenced path.

Why It Still Ships in 2026

The pattern is well-documented, the fix is short, and competent engineers introduce it anyway. Three things keep CWE-22 alive. First, framework defaults look safer than they are. path.join in Node, os.path.join in Python, and Paths.get in Java all accept absolute paths as later arguments and silently discard the base; engineers reading the API name reasonably assume "join" means "append," and the bug is invisible until someone tests it. Second, the vulnerability lives at the seam between an HTTP handler and a filesystem call, and the two are often written by different people on different days; the input source no longer looks like an input source by the time it reaches the sink. The Jenkins CVE was exactly this — the CLI argument parser and the file expansion feature lived in code that did not obviously interact.

Third, third-party packages introduce the bug by proxy. CVE-2024-29415, disclosed in July 2024 against the popular ip npm package, exposed a path-adjacent class of vulnerability where attacker-controlled input flowed through a parser into an SSRF/path decision the application thought was already validated. Most of the affected applications had not written a single character of vulnerable code themselves. Path traversal in 2026 is rarely a vulnerability of inattentive juniors; it is a vulnerability of system composition under deadline. The Apache CVE-2021-41773 and its incomplete patch CVE-2021-42013 reinforced the same lesson on a server everybody trusted: a small refactor introduced a path normalization regression, and a fix that did not cover the URL-decoding path required a second emergency release ten days later. If the Apache HTTP Server team can ship this, your application can too.

Detection: Where Each Layer Earns Its Keep

Path traversal sits at the intersection of every major AppSec discipline, and each one catches a different slice. SAST traces user input from the HTTP request handler through every intermediate variable to the filesystem call, flagging cases where no canonicalization or boundary check sits between source and sink — see why data flow analysis matters for the mechanics of inter-procedural taint tracking. DAST attacks the running application with an encoded payload library and watches for file contents in responses; it confirms exploitability but cannot reach a parameter the crawler does not discover. SCA flags vulnerable third-party packages, which is how teams that pinned ip@2.0.0 learned about CVE-2024-29415 without reading the advisory directly. Manual review remains the only layer that catches business-logic variants — for example, a tenant-scoped file system where the path validates correctly against the wrong tenant's root.

SAST is the cheapest of the four because it catches the bug in the diff that introduced it, before the build, before the deployment, before the scanner or the tester needs to confirm it at runtime. The other layers remain necessary; they just shoulder less of the work. Path traversal also overlaps with the access control layer: see A01 Broken Access Control for the broader category of authorization failures that path-based file servers tend to produce.

Prevention Checklist

Six rules cover the overwhelming majority of real-world cases. Apply them in order; each later rule assumes the earlier ones are in place.

  • Canonicalize before you validate. Use path.resolve, Path.toAbsolutePath().normalize(), or realpath first. Validating raw input before normalization is how every encoded-payload bypass works.
  • Reject paths that escape the base after canonicalization. Compare with a trailing path separator on the prefix, or use language-native containment checks like Python's Path.relative_to.
  • Allowlist over blocklist. Where the set of legal files is small and known (templates, locales, configured assets), look up the requested name in an allowlist and serve from a server-controlled mapping. Do not try to enumerate every dangerous string.
  • Use the framework helper, but read its docs. send_from_directory, res.sendFile with the root option, and ServletContext.getResourceAsStream all encode boundary checks. They only help if you actually call them with the right base directory.
  • Never serve files based on raw user input. If a request says "give me the file at this path," map the request to an internal identifier and look up the path server-side. The user names a record; your code names the file.
  • Log failed access attempts. A spike in 400-range responses on a file endpoint is the cheapest possible signal that someone is probing for traversal, and most teams discover this only post-incident.

Where GraphNode SAST Fits

GraphNode SAST performs deep data flow analysis across 13+ languages, tracing user input from HTTP request handlers through intermediate variables and method boundaries to filesystem sinks. CWE-22 is a textbook source-to-sink taint pattern, and the engine flags handlers that concatenate request parameters into filesystem calls without an intervening canonicalization-and-containment check. Findings surface during code review, on the diff that introduced the bug, with the engineer who wrote it. SAST does not replace runtime confirmation, but it is the only layer cheap enough to run on every commit, which is exactly what a vulnerability class this old and this preventable deserves.

Closing

CWE-22 has a documented fix that fits in five lines. Despite that, it shipped in Jenkins, in Apache HTTP Server, in widely-deployed npm packages, and in the next release of a product near you. The pattern is durable not because it is hard to fix in isolation but because it is easy to introduce when nobody is looking specifically for it. The teams that stop shipping path traversal are the ones that move detection upstream — into the diff, into the pull request, into the moment the code is written rather than the moment it is exploited. That is the only place the economics work out in the defender's favor.

GraphNode SAST traces user input from request handler to filesystem sink in 13+ languages -- request a demo.

Request Demo