- Analyzed language: C/C++
If you are attending this workshop at GitHub Satellite, or watching a recording, the facilitators will guide you through the steps below. You can use this document as a written reference.
To take part in the workshop you will need to set up a CodeQL development environment. See the Prerequisites section in the README for full instructions.
When you have completed setup, you should have:
- Installed the Visual Studio Code IDE.
- Installed the CodeQL extension for Visual Studio Code.
- Cloned this repository with
git clone --recursive. - Opened this repository in VS Code.
- Downloaded, imported, and selected the
example_dbCodeQL database from within VS Code. - A
workshop-queriesfolder within your workspace, containing an example query. - A
codeqlfolder within your workspace, containing the CodeQL standard libraries for most target languages. - A copy of this
workshop.mdguide in your workspace. - Open the query
workshop-queries/example.qland try running it!
Use-after-free vulnerabilities occur when a program retains a pointer to memory locations after they have been freed, and attempts to reference the freed memory. When the memory was freed, the system may choose to allocate that memory for another purpose. Attempting to reference the freed memory could result in a variety of unsafe behaviour: crashing the program, retrieving an unexpected value, corrupting data used by another program, or executing unsafe code.
The following C code shows a simple example of using memory after it has been freed.
free(s->x);
...
use(s->x);The code frees the field x of a struct s, but does not immediately reset the field's value to zero. As a result, the struct now contains a 'dangling' pointer, which creates the potential for a use-after-free vulnerability. This becomes a real vulnerability when the code references s->x again, passing it to use.
A safer coding practice is to always immediately zero the field after freeing it, like this:
free(s->x);
s->x = 0;Then until s->x is reassigned, any attempts to reference it will simply obtain the null memory address.
This is a well-known class of vulnerability, documented as CWE-416. A relatively recent example in the curl tool was assigned CVE-2018-16840, and inspired the material here.
In security terminology, a reference to freed memory is considered a source of tainted data, and a pointer that is dereferenced (used) is considered a sink for a use-after-free vulnerability.
If the tainted reference is reassigned (e.g. to zero) before it reaches a use, it is considered safe.
In this workshop, we will use CodeQL to analyze a sample of C++ source code that demonstrates simple variants of use-after-free vulnerabilities, and write a CodeQL query to identify the vulnerable pattern with reasonable precision.
- Use the IDE's autocomplete suggestions (
Ctrl+Space) and jump-to-definition command (F12) to explore the CodeQL libraries. - To run a query, open the Command Palette (
Cmd+Shift+PorCtrl+Shift+P), and click CodeQL: Run Query. You can also see this command when right-clicking on a query file in the editor. - Try this out by running the example query
example.qlin the workshop repository! - When the query completes, click on the results to jump to the corresponding location in the source code.
- To run a part of a query, such as a single predicate, open the Command Palette and click CodeQL: Quick Evaluation. You can also see this command when right-clicking on selected query text in the editor.
- To understand how the source code is represented in the CodeQL libraries, use the AST Viewer. You can see this in the left panel of the CodeQL view. Click on a query result to get to a source file, and then click View AST, or run CodeQL: View AST from the Command Palette.
The rest of the workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step.
Each step has a Hint that describes useful classes and predicates in the CodeQL standard libraries for C/C++ and keywords in CodeQL.
Each step has a Solution that indicates one possible answer. Note that all queries will need to begin with import cpp to use the standard libraries, but for simplicity this may be omitted below.
- Find all references of
freefunction calls in the code- Find all variables which are freed in the course of the program
- Find all references of variables after they are freed
- Find all the variables which are used at any point
- Hint: Variables are dereferenced after they are used at any point
- Wire the results of both of our queries above to find if there's a path between our Source and Sink
-
Find all function call expressions, such as
free(x)anduse(y, z).Hint
After you have run the example query and clicked on a result, look at the AST Viewer for the
example.cppsource file. A function call is called aFunctionCallin the CodeQL C/C++ library.Solution
from FunctionCall call select call
-
Identify the expression that is used as the first argument for each call, such as
free(<first arg>)anduse(<first arg>, z).Hint
- Add another variable to your
fromclause. Declare its type (this can beExpr) and give it a name. - Add a
whereclause. - The AST viewer and autocomplete tell us that
FunctionCallhas a predicategetArgument(int)to find the argument at a 0-based index.
Solution
from FunctionCall call, Expr arg where arg = call.getArgument(0) select arg
- Add another variable to your
-
Filter your results to only those calls to a function named
free.Hint
FunctionCallhas a predicategetTarget()to find theFunctionbeing called.- A
Function(and most other named elements) has predicatesgetName()andhasName(string)to identify its name as a string. - You may also be interested in the predicate
hasGlobalOrStdName(string), which identifies named elements from the global orstdnamespaces. - Use the
andkeyword to add conditions to your query. - If you use
getName(), use the=operator to assert that two values are equal. If you usehas*Name(string), passing the name into the predicate makes the assertion.
Solution
from FunctionCall call, Expr arg where arg = call.getArgument(0) and call.getTarget().hasGlobalOrStdName("free") select arg
-
(Bonus) What other operations might free memory? Try looking for
deleteexpressions using CodeQL. The example for this workshop only usesfreebut another codebase may use variations of this function name, or use different delete operators. -
Factor out your logic into a predicate:
predicate isSource(Expr arg) { ... }.Hint
-
The
predicatekeyword declares a relation that has no explicit result / return value, but asserts a logical property about its variables. -
The
fromclause of a query allowed you to declare variables, and thewhereclause described conditions on those variables.Within a predicate definition, variables are either declared as the parameters of the predicate, or 'locally' using the
existskeyword. The first part of theexistsdeclares some variables, and the body acts like awhere, enforcing some conditions on the variables.exists(<type> <variableName> | // some logic about the variable here )
-
Use Quick Evaluation to evaluate the predicate on its own.
Solution
predicate isSource(Expr arg) { exists(FunctionCall call | arg = call.getArgument(0) and call.getTarget().hasGlobalOrStdName("free") ) }
-
-
We are going to track the flow of information from the pointer that was freed. For this, we will use the CodeQL library for data flow analysis, which helps us answer questions like: does this expression ever hold a value that originates from a particular other place in the program?
We can visualize the data flow analysis problem as one of finding paths through a directed graph, where the nodes of the graph are places in the source code that may have a value, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two nodes.
The class
DataFlow::Nodedescribes all data flow nodes. These are different from the abstract syntax tree (AST) nodes, which only represent the structure of the source code.DataFlow::Nodehas various subclasses that describe different types of node, depending on the type of program syntax element they correspond to.You can find out more in the documentation.
Modify your predicate to describe
argas aDataFlow::Node, not anExpr.Instructions
- Add
import semmle.code.cpp.dataflow.DataFlowto your query file. - Change your predicate so that the parameter has type
DataFlow::Node. - This will give you a compile error, since the types no longer match. Convert the data flow node back into an
Exprusing the predicateasExpr().
Solution
import semmle.code.cpp.dataflow.DataFlow predicate isSource(DataFlow::Node arg) { exists(FunctionCall call | arg.asExpr() = call.getArgument(0) and call.getTarget().hasGlobalOrStdName("free") ) }
- Add
-
Let's think about the meaning of the
freefunction and the value of its argument.Before the function runs, the function argument is a pointer to memory, and is passed to the function by reference.
After the function body, the memory that was referenced by the pointer has been freed.
So the one expression for the function call argument in the program syntax actually two possible values to think about in the data flow graph:
- the pointer before it was freed
- the dangling pointer after it was freed.
Expand the Hint to see how to distinguish between these two cases. Modify your predicate so that
argdescribes the memory after it has been freed, not before.Hint
-
The value before the call is a
DataFlow::ExprNode, a subtype ofDataFlow::Node. -
We can call
asExpr()on such a node to get the original syntactic expression. -
The value after the call is a
DataFlow::DefinitionByReferenceNode. -
We can call
asDefiningArgument()on such a node to get the original syntactic expression. -
Jump to the definition of
DataFlow::Nodeto read more. -
Modify your predicate to describe
argusinggetDefiningArgument().
Solution
predicate isSource(DataFlow::Node arg) { exists(FunctionCall call | arg.asDefiningArgument() = call.getArgument(0) and call.getTarget().hasGlobalOrStdName("free") ) }
A dereference is a place in the program that uses the memory referenced by a pointer.
-
Write a
predicate isSink(DataFlow::Node sink)that describes expressions that may be dereferenced.Hint
- Think of some examples of operations that might dereference a pointer. The
*operator? Passing it to a function? Performing pointer arithmetic? Use autocomplete or the AST viewer to explore how these are modelled in CodeQL. - Search for
dereferencein autocomplete to find a predicate from the standard library that models all these patterns for you.
Solution
predicate isSink(DataFlow::Node sink) { dereferenced(sink.asExpr()) }
- Think of some examples of operations that might dereference a pointer. The
We have now identified (a) places in the program which reference freed memory and (b) places in the program which dereference a pointer to memory. We now want to tie these two together to ask: does a pointer to freed memory ever flow to a potentially unsafe a dereference operation?
This a data flow problem. We could approach it using local data flow analysis, whose scope would be limited to a single function. However, it is possible for the free and dereference operations to be in different functions. We call this a global data flow problem, and use CodeQL's libraries for this purpose.
In this section we will create a path-problem query capable of looking for global data flow, by populating this template:
/**
* @name Use after free
* @kind path-problem
* @id cpp/workshop/use-after-free
*/
import cpp
import semmle.code.cpp.dataflow.DataFlow
import DataFlow::PathGraph
class Config extends DataFlow::Configuration {
Config() { this = "Config: name doesn't matter" }
/* TODO move over solution from Section 1 */
override predicate isSource(DataFlow::Node source) {
exists(/* TODO fill me in from Section 1 */ |
/* TODO fill me in from Section 1 */
)
}
/* TODO move over solution from Section 2 */
override predicate isSink(DataFlow::Node sink) {
/* TODO fill me in from Section 2 **/
}
}
from Config config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink, source, sink, "Memory is $@ and $@, causing a potential vulnerability.", source, "freed here", sink, "used here"-
Fill in or move the
isSourcepredicate you wrote for Section 1. -
Fill in or move the
isSinkpredicate you wrote for Section 2. -
You can now run the completed query. Use the path explorer in the results view to check the results.
Completed query
/** * @name Use after free * @kind path-problem * @id cpp/workshop/use-after-free */ import cpp import semmle.code.cpp.dataflow.DataFlow import DataFlow::PathGraph class Config extends DataFlow::Configuration { Config() { this = "Config: name doesn't matter" } override predicate isSource(DataFlow::Node source) { exists(FunctionCall call | source.asDefiningArgument() = call.getArgument(0) and call.getTarget().hasGlobalOrStdName("free") ) } override predicate isSink(DataFlow::Node sink) { dereferenced(sink.asExpr()) } } from Config config, DataFlow::PathNode source, DataFlow::PathNode sink where config.hasFlowPath(source, sink) select sink, source, sink, "Memory is $@ and $@, causing a potential vulnerability.", source, "freed here", sink, "used here"
-
Bonus: Does your query handle the false positives in the example code? How can we expand it to handle more real-world codebases?
- CodeQL overview
- CodeQL for C/C++
- Analyzing data flow in C/C++
- Using the CodeQL extension for VS Code
- CodeQL on GitHub Learning Lab
- CodeQL on GitHub Security Lab
This is a modified version of a Capture-the-Flag challenge devised by @kevinbackhouse, available at https://securitylab.github.com/ctf/eko2020.