OWASP A03:2021 Injection: SQL, XSS, Command, LDAP

TL;DR

A03:2021 Injection covers any case where untrusted input is interpreted as code or commands by a downstream interpreter: SQL, NoSQL, OS command, LDAP, XPath, ORM, expression-language, and — new in 2021 — cross-site scripting (XSS), which was previously its own category. Injection dropped from #1 in 2017 to #3 in 2021, but the consolidation with XSS means coverage is broader than ever. The detection technique that actually works is SAST with deep interprocedural data flow analysis — tracing tainted user input from source to sink across function boundaries. Pattern matching catches the trivial cases; data flow catches second-order injection and the multi-hop paths that hide in real codebases.

Injection has been on every OWASP Top 10 list since the original 2003 publication. It held the #1 position for over a decade — through the 2010, 2013, and 2017 editions — before dropping to #3 in 2021. The drop is not because injection went away. It is because OWASP rebalanced its methodology to emphasize incidence rate, and broken access control turned out to have a higher incidence rate across tested applications. Injection itself remains widespread, with the 2021 data showing 94 percent of applications tested for some form of injection and an average incidence rate around 3.37 percent.

The 2021 edition also folded cross-site scripting (XSS) — previously A07:2017 in its own right — into A03 because the underlying mechanism is identical: untrusted data reaching a sink that interprets it. That consolidation is the most important structural change to this category since the list began, and it means a serious A03 program now needs to cover both classic injection (SQL, OS command, LDAP) and the entire XSS family (reflected, stored, DOM-based) under one umbrella. The detection technique is the same for both: data flow analysis with taint propagation.

What Injection Means

Injection is a single concept that shows up in many forms. The pattern is always the same: untrusted data — typically supplied by a user, but sometimes pulled from a database or a file or an environment variable — is concatenated into a string that a downstream interpreter then parses as code, a query, a command, or markup. The interpreter cannot tell the difference between data the developer intended and data the attacker injected. The classic SQL example is "SELECT * FROM users WHERE email='" + email + "'": the interpreter does not know that ' in the email field is meant as a literal apostrophe in the user's name rather than as a SQL string terminator.

The 2021 consolidation merged XSS into A03 explicitly because the mechanism is identical. A reflected XSS bug echoes a query parameter into an HTML page without escaping; the browser parses the resulting markup and runs whatever <script> the attacker injected. The interpreter is the browser instead of the database, but the data-becomes-code transition is exactly the same. OWASP's guidance has long argued that injection should be treated as a single class of weakness with subcategories per interpreter, and the 2021 edition is the first list to fully reflect that view.

Injection persists despite more than 20 years of awareness for three reasons. First, the dangerous patterns are often non-obvious in real code: tainted input crosses many method boundaries, passes through builders and ORMs and template engines, and reaches the sink in code that does not look dangerous on inspection. Second, sanitization is context-specific — what neutralizes SQL injection does not neutralize XSS, what neutralizes HTML XSS does not neutralize JavaScript-context XSS — and developers routinely apply the wrong sanitizer. Third, modern frameworks introduce new injection surfaces faster than secure-coding training can keep up: GraphQL injection, NoSQL injection in MongoDB, server-side template injection in Jinja2 and Twig, and expression-language injection in Spring all postdate most developer training programs.

The Eight Major Injection Subcategories

A03 aggregates a broad family of injection types that share the source-to-sink mechanism but differ in interpreter and exploit shape. The eight most common subcategories are summarized below.

Subcategory	Interpreter	Primary CWE
SQL Injection	Relational database engine	CWE-89
NoSQL Injection	MongoDB, CouchDB, etc.	CWE-943
OS Command Injection	Shell or system call	CWE-78
LDAP Injection	Directory service	CWE-90
XPath Injection	XML query engine	CWE-643
ORM Injection	ORM query builder	CWE-564
Expression Language Injection	EL/OGNL/SpEL evaluator	CWE-917
Cross-Site Scripting (XSS)	Browser HTML/JS parser	CWE-79

SQL injection is the original and still the most prevalent. An attacker manipulates a SQL query to bypass authentication, extract arbitrary records, or in some database engines escalate to remote code execution. NoSQL injection targets document stores like MongoDB; the syntax differs (operator-based rather than string-based) but the principle is identical. OS command injection occurs when user input is passed to a shell, allowing arbitrary command execution with the privileges of the application process — frequently a path to full server compromise.

LDAP injection manipulates directory queries to bypass authentication or extract user records. XPath injection targets XML-based data stores and configuration files. ORM injection happens when an ORM exposes a raw-query escape hatch and developers concatenate user input into it; the ORM removes most injection risk for normal usage but does not protect against the unsafe API call. Expression-language injection targets evaluators like OGNL (Apache Struts), SpEL (Spring), and JSP EL; this is the family responsible for the Equifax breach in 2017. XSS — now under A03 — targets the browser, with reflected, stored, and DOM-based variants each requiring slightly different detection patterns.

Real-World Incidents

Injection is the OWASP category with the deepest catalog of public, well-documented breaches. The pattern repeats across two decades of incident reports: a sink with concatenated user input, a missing or wrong sanitizer, and an attacker who finds it before the defenders do.

Equifax (2017) remains the textbook expression-language injection breach. Attackers exploited CVE-2017-5638, a remote code execution vulnerability in the Jakarta Multipart parser of Apache Struts that allowed OGNL expressions in the Content-Type header to be evaluated server-side. A patch was available in March 2017; Equifax did not apply it before the breach window in May. The exposure compromised approximately 147 million records. The vulnerability is technically EL injection (CWE-917) and is one of the most cited examples of why the A03 family extends well beyond SQL.

Heartland Payment Systems (2008) was a SQL injection breach that exposed roughly 130 million card records, one of the largest payment-card breaches in U.S. history at the time. TalkTalk (2015) was another SQL injection incident, exposing personal data on around 157,000 customers and ultimately resulting in a record fine from the U.K. Information Commissioner's Office; the underlying flaw was reportedly a legacy webpage with an unparameterized query. Several incidents at Sony in 2011, including the Sony Pictures and PlayStation Network breaches, involved SQL injection among other techniques. The Magento ecosystem has seen recurring PHP code-injection vulnerabilities over the years, with multiple critical advisories driving emergency patching cycles for e-commerce operators. The pattern is consistent: long-tail injection bugs in production code, exploited at scale, with high impact.

Relevant CWE Mappings

A03 aggregates 33 underlying CWEs. The eight most operationally important for detection rule design are summarized below; these are the CWEs that any serious SAST rule pack should cover, and they map directly onto the subcategories above.

CWE	Title	A03 Subcategory
CWE-89	SQL Injection	SQL
CWE-79	Cross-Site Scripting	XSS
CWE-78	OS Command Injection	Command
CWE-90	LDAP Injection	LDAP
CWE-91	XML Injection	XML
CWE-643	XPath Injection	XPath
CWE-94	Code Injection	Code
CWE-917	Expression Language Injection	EL/OGNL/SpEL

Detection: Why SAST Data Flow Analysis Is the Right Tool

Injection is the canonical use case for static analysis with deep data flow tracking. The reason is structural: every injection vulnerability follows the same shape — a source (HTTP parameter, file read, message queue payload, deserialized object), one or more propagators (assignment, concatenation, method call, field read), and a sink (database query, shell call, HTML render, LDAP search, eval). When a path exists from a source to a sink without passing through a recognized sanitizer, that path is an exploitable injection bug. This source-to-sink analysis is exactly what taint propagation engines compute.

Pattern matching catches the trivial in-line case where source and sink appear in the same function. It breaks down the moment the dangerous flow crosses a method boundary, which it does in essentially every real codebase. A controller method extracts a request parameter and passes it to a service. The service hands it to a repository. The repository builds a query through a builder pattern. None of those individual lines look dangerous. Only the path from source through three methods to the executeQuery sink reveals the vulnerability — and only an interprocedural data flow engine can see that path. For the technical mechanics of why this matters, see why data flow analysis matters.

Data flow analysis also catches second-order injection, which pattern matchers cannot see at all. In a second-order bug, tainted input is stored in a database without being immediately injected anywhere, then later read back and concatenated into a query or command. The store-and-retrieve cycle hides the source from any analyzer that only looks at one execution path. A taint engine that models database round-trips — treating reads from any table that has a tainted-input write source as themselves tainted — surfaces these correctly. This is the same mechanism that catches stored XSS, where a comment field is written without sanitization and rendered later on another page.

That said, SAST has limits. Some injection bugs require runtime context that no static analyzer can know. HTTP response splitting (CRLF injection) often depends on the specific HTTP server and reverse-proxy configuration in deployment. Server-side template injection requires knowing which template engine is bound at runtime, which may be configured rather than imported. For these cases, DAST and manual fuzzing complement SAST — the static layer flags candidates and the dynamic layer confirms exploitability. GraphNode SAST is built around taint propagation across 13+ languages with 780+ rules, and is paired with DAST in a complete program rather than positioned as a replacement for it.

Prevention

Prevention for A03 is well understood; the gap is consistency in applying it across every sink in a large codebase. The list below covers the core practices, organized by injection subcategory.

SQL injection. Use parameterized queries (prepared statements) for every database call. Bind parameters by placeholder; never concatenate user input into the query string. ORMs (Hibernate, Entity Framework, SQLAlchemy, Active Record) build parameterized queries by default for normal usage — but ORM raw-query escape hatches reintroduce the risk and are themselves a common source of bugs. If dynamic table or column names are required, validate them against an allow-list rather than passing them through.

OS command injection. Avoid passing user input to a shell at all. When a system call is unavoidable, use the argument-array form of the API (subprocess.run(["cmd", arg]) in Python, ProcessBuilder with a string array in Java) rather than a single string passed to a shell. The argument-array form does not invoke /bin/sh and does not interpret shell metacharacters. Validate the command itself against an allow-list of permitted operations.

XSS. Apply context-appropriate output encoding wherever user input is rendered into HTML, JavaScript, CSS, or URL contexts. Frameworks like React and Vue auto-escape by default for HTML text content but do not protect against unsafe APIs (v-html, dangerouslySetInnerHTML) or JavaScript-context injection. Deploy a strict Content Security Policy with script-src restricted to specific origins or nonces; CSP is the layered defense that mitigates XSS even when an output-encoding bug slips through.

LDAP, XPath, NoSQL. Use prepared-query equivalents where the library supports them. Validate input against an allow-list of expected characters; LDAP and XPath have small character sets that make allow-listing tractable. Expression-language injection. Disable EL evaluation on user-controlled inputs. Audit any code path where user-supplied strings reach SpringExpressionParser, OGNL evaluators, or template engines. Across all subcategories, prefer allow-list validation over deny-list filtering — block-listing dangerous characters is brittle because new bypass techniques are published continuously, while allow-listing the exact expected character set survives them.

Where GraphNode SAST Fits: Strongest Detection Category

Injection is the area where deep static analysis distinguishes itself most clearly from pattern-only linters. GraphNode SAST is built on context-aware taint propagation across 13+ languages — C#, Java, JavaScript, Python, PHP, Swift, Kotlin, Objective-C, C/C++, VB.NET, HTML, and more. The 780+ rule pack covers OWASP Top 10 and CWE Top 25 with rules for every subcategory of A03: SQL injection, NoSQL injection, OS command injection, LDAP injection, XPath injection, ORM raw-query injection, expression-language injection, and the full XSS family (reflected, stored, DOM-based) with dedicated detection paths.

The engine traces tainted user input from sources (HTTP request parameters, multipart uploads, deserialized payloads, message queue reads) through propagator operations (assignments, method calls, field reads, collection operations, type conversions) to sinks (database execute calls, shell invocations, HTML render points, LDAP search calls, template evaluation) — across function and module boundaries, with sanitizer recognition that clears taint when input passes through a known-safe escaping or parameterization function. The result is findings that correspond to actual exploitable conditions rather than syntactic coincidences.

The honest limit is the same one every SAST engine shares: not every injection finding is reachable in production, and signal-to-noise discipline matters. GraphNode applies AI-assisted triage to suppress patterns that are not exploitable in the deployed configuration. For the techniques that drive false-positive reduction in serious static analysis, see reducing false positives in SAST.

Frequently Asked Questions

Is XSS still in the OWASP Top 10?

Yes, but not as its own category. In the 2017 edition XSS was A07, a standalone Top 10 entry. In the 2021 edition OWASP folded XSS into A03 Injection, on the basis that the underlying mechanism — untrusted data reaching a sink that interprets it — is identical to SQL injection, OS command injection, and the rest of the injection family. So XSS remains a Top 10 risk; it is just now categorized under A03 alongside the other injection subcategories rather than reported separately.

Why did injection drop from #1 to #3 in OWASP 2021?

The drop reflects an OWASP methodology change rather than a real reduction in injection prevalence. The 2021 edition made incidence rate (the percentage of tested applications with at least one finding of the weakness) the primary ranking input, with exploitability and impact factored in but no longer dominant. Broken access control had a higher incidence rate across the 2021 dataset, so it took the #1 position, and cryptographic failures took #2. Injection itself remains widespread — 94 percent of applications in the dataset were tested for it, with an average incidence rate around 3.37 percent. The drop is positional, not substantive.

What is the difference between SQL injection and NoSQL injection?

Both are injection bugs that target a database, but the syntax of the attack differs. SQL injection manipulates string-based query syntax — payloads like ' OR '1'='1 or ; DROP TABLE exploit the SQL parser. NoSQL injection in MongoDB and similar document stores typically manipulates the operator structure: an attacker sends a JSON object like {"$ne": null} in place of a string, exploiting the query API's willingness to interpret operator objects. The detection technique is the same — taint flow from source to query sink — but the dangerous payload shapes are different, and a SAST rule pack needs explicit NoSQL coverage to find them.

Can prepared statements prevent all SQL injection?

Prepared statements prevent SQL injection in the data portion of a query — the values bound as parameters. They do not protect dynamic identifiers (table names, column names, ORDER BY targets) because those cannot be parameterized; if user input controls an identifier and you concatenate it into the query, prepared statements do not help. They also do not protect against second-order injection where tainted data is written to the database in one query and concatenated into another query later. For full coverage, combine prepared statements with allow-list validation for any dynamic identifiers and consistent sanitization on data write paths, and use a SAST engine that models second-order flows.

Can SAST detect all injection vulnerabilities?

SAST with deep data flow analysis is the strongest single tool for injection detection and catches the vast majority of A03 vulnerabilities, including subtle multi-hop and second-order cases that pattern matchers miss. It does not catch every case. Some injection bugs depend on runtime context that static analysis cannot know — HTTP response splitting that depends on the specific HTTP server configuration, server-side template injection where the template engine is bound dynamically, command injection through indirect shell invocation in deployed scripts. For complete coverage, layer SAST with DAST (which confirms exploitability at runtime) and manual penetration testing for the cases neither tool reaches.