Skip to content

C++: Reduce re-evaluation#21390

Merged
MathiasVP merged 19 commits intogithub:mainfrom
MathiasVP:less-reevaluation-4
Mar 3, 2026
Merged

C++: Reduce re-evaluation#21390
MathiasVP merged 19 commits intogithub:mainfrom
MathiasVP:less-reevaluation-4

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Feb 27, 2026

This PR removes most of the re-evaluation of predicates from previous stages. In particular, the dataflow stage made use of various internal predicates from IR generation which caused some unfortunate re-evaluation.

There were also some other stage-related issues where various predicates in dataflow were cached, but not put inside a cached module, which produced unnecessary stages.

Unfortunately, this was easier said than done. It required a large refactoring of the dataflow files because I we have made some fairly arbitrary choices as to what belonged in DataFlowUtil.qll and DataFlowPrivate.qll which made it hard to unify the stages. I tried my best to make this refactoring commit-by-commit reviewable.

Most of this PR has been guided by the output of Schack's stageoverlap.py script. There should be no changes to any resulting tuples.

DCA looks great. We shave off about 4-5% of the total analysis time 🎉 Additionally, my experience is that this PR removes a lot of recomputation of DataFlowUtil.qll/DataFlowPrivate.qll predicates when I modify an isSource or isSink in a query.

@github-actions github-actions bot added the C++ label Feb 27, 2026
@MathiasVP MathiasVP marked this pull request as ready for review February 27, 2026 17:56
@MathiasVP MathiasVP requested a review from a team as a code owner February 27, 2026 17:56
Copilot AI review requested due to automatic review settings February 27, 2026 17:56
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the C++ dataflow implementation to reduce re-evaluation of predicates across compilation stages, achieving approximately 4-5% improvement in total analysis time. The changes reorganize how cached predicates are structured and introduce a new DataFlowNodes.qll file to centralize node definitions, thereby reducing stage overlap.

Changes:

  • Introduces DataFlowNodes.qll to centralize dataflow node definitions, moving classes like Node, ParameterNode, VariableNode, etc. from DataFlowUtil.qll
  • Wraps cached predicates in Cached modules with appropriate stage collapsing to prevent unnecessary recomputation
  • Adds new cached predicates in IRConstruction.qll to expose translation information without causing re-evaluation
  • Reorganizes TaintTrackingUtil.qll to wrap predicates in a Cached module for proper stage management
  • Moves several cached predicates from DataFlowPrivate.qll into the Cached module for better stage control

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
FlowSummaryNode.ql Adds import of DataFlowNodes for test access to node classes
CaptureModels.qll Updates references from DataFlow::FieldAddress to DataFlowNodes::FieldAddress
IRConstruction.qll Adds cached predicates to expose translation information (allocation extents, transparent conversions, etc.)
TaintTrackingUtil.qll Wraps taint tracking predicates in a Cached module with stage collapsing
SsaImplCommon.qll Moves utility predicates into Cached module for proper stage management
SsaImpl.qll Delegates to IRConstruction for initialization target address
PrintIRUtilities.qll, PrintIRLocalFlow.qll, PrintIRFieldFlowSteps.qll Adds DataFlowNodes imports for access to node classes
ModelUtil.qll Adds DataFlowNodes import
ExprNodes.qll Uses IRConstruction cached predicates instead of direct translation class access
DataFlowUtil.qll Major refactoring: moves node definitions to DataFlowNodes.qll, imports DataFlowNodes::Public
DataFlowPrivate.qll Consolidates cached predicates (toString, getLocation, newtype definitions) into Cached module
DataFlowNodes.qll New file containing node class definitions, Content types, and related predicates
FlowSummaryImpl.qll Updates toString call to use toStringImpl()
ExternalFlow.qll Adds pragma[nomagic] for performance optimization

import cpp
import semmle.code.cpp.rangeanalysis.RangeAnalysisUtils
import semmle.code.cpp.ir.dataflow.DataFlow
private import semmle.code.cpp.ir.dataflow.internal.DataFlowNodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look ideal that we now need to import something internal here. Does this query do something odd that requires this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: 18dea55

I'm not sure why we did it this way, but it imports internal dataflow nodes to find definitions outside a loop. I think we could do this with the proper SSA API we have these days, but I didnt want to change that in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For standard queries you still only have to import semmle.code.cpp.dataflow.new.DataFlow

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As far as I can tell this is really just moving around stuff (and adding some missing comments). DCA numbers look ok.

@MathiasVP MathiasVP merged commit 6a904ed into github:main Mar 3, 2026
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants