exploiting inconsistencies in the esprima AST parser - PingCTF writeup
PingCTF 2023 had an interesting web challenge, called calc. The solution involved a simple logic bug, but there were some unintended workarounds based on inconsistencies in the esprima.js parser that i think are worth sharing
the challenge
This is a classic xss challenge, where you have to find a reflected xss and send the malicious link to a bot to steal its cookies. The vulnerable app is a simple html+js calculator, with the following logic flow:
function runCode(untrusted_code) {
var AST = esprima.parse(untrusted_code);
try {
validateProgram(AST);
var html = `Result from evaluating code <code>${untrusted_code}</code> is ${eval(
untrusted_code
)}`;
document.getElementById("output").innerHTML = html;
} catch (e) {
document.getElementById("output").innerHTML = e;
}
}
- untrusted js code is read from the url (or from the textarea)
- the code is parsed into an AST using the esprima.js library
- the AST is validated against a strict set of rules, to make sure there is no malicious code
- if everything looks right, the code is run with an
eval
validateProgram
is a set of mutually recursive functions, that walk the AST and throw an error every time they find a pattern that does not match their allow-list
function validateProgram(program) {
for (var statement of program.body) {
if (statement.type != "ExpressionStatement") {
console.log(`VALIDATEPROGRAM: ${statement.type} != ExpressionStatement`)
throw "Invalid Program";
}
validateExpression(statement.expression);
}
}
function validateExpression(expression) {
if (expression.type != "AssignmentExpression") {
throw (
"Invalid Expression, expected AssignmentExpression got: " +
expression.type
);
}
validateIdentifier(expression.left);
validateCalculation(expression.right);
}
// ... rest of the validation functions
the bug
The first anomaly is in the runCode
function: both the result of the eval and the validation errors raised by validateProgram
are injected into the dom using innerHTML
, instead of the correct and safe alternative innerText
Coincidentally enough, there is a line of code in validateIdentifier
that partially prints back our input when we provide an Identifier that is not included in the whitelist
// ...
if (!/^[a-zA-Z]$/.test(identifier.name) && identifier.name != "Math") {
throw "Invalid Identifier name: " + identifier.name;
}
This means that if we provide any code with an identifier not in the whitelist, such as TEST = 1 + 1
,
that invalid identifier will be injected into the DOM using innerHTML, with the following result:
<div id="output">Invalid Identifier name: TEST</div>
The catch is that the injected text must be a javascript Identifier, which is never valid HTML… At least according to the standard
esprima js
Esprima is an old Ecmascript 2019 parser.
Browsers today are running newer versions of the standard, which means that this challenge is inspecting an AST that doesn’t fully match the AST that will be executed.
Additionally, over the years the project accumulated hundreds of issues on github related to parsing inconsistencies.
We don’t have to go very far to find this useful issue for our exploit.
According to the report, any identifier character specified using \UnicodeEscapeSequence
is accepted by esprima as part of that identifier.
This is exactly what we need!
We can write an HTML tag that will trigger an xss:
<img src=1 onerror=alert(1)> = 1
Then, we can convert all the characters that would not be valid in a javascript identifier, and convert them into unicode escale sequences:
\u{03c}img\u{020}src\u{3d}1\u{020}onerror\u{3d}alert\u{020}\u{3E} = 1+1
Inject the payload into the input box, and voilà we successfully injected html:
<div id="output">Invalid Identifier name: <img src=1 onerror=alert(1)></div>
There is only one issue: The image is created, but we get a CSP error
The correct solution
Turns out, there is a very strict CSP policy that cannot be bypassed in any way.
content="default-src 'none';
object-src 'none';
style-src-elem 'self';
script-src 'self' 'unsafe-eval';
The policy blocks any kind of attack on the innerHTML
flaw,
including the Identifier trick we covered and a similar one that exploited returned strings and inconsistencies in parsing comments.
In the end, the correct solution involved a logic error that made it possible to bypass the AST validation. A full writeup of the exploit is available here