Python: unpin legacy CFG/ESSA from the AST cached stage (DCA measurement)#22116
Draft
yoff wants to merge 1 commit into
Draft
Python: unpin legacy CFG/ESSA from the AST cached stage (DCA measurement)#22116yoff wants to merge 1 commit into
yoff wants to merge 1 commit into
Conversation
The legacy CFG (`Flow.qll`) and legacy ESSA (`Essa`/`SsaCompute`/ `SsaDefinitions`) were pinned into the always-on `Stages::AST` cached stage via `Stages::AST::ref()` and the matching `backref()` disjuncts. Because a cached stage is materialized as a unit once any of its predicates is demanded (and every query demands e.g. `Expr.toString()`), this forced the legacy CFG/ESSA to be computed for *every* query -- including the security/dataflow queries, which after the shared-CFG dataflow flip no longer depend on the legacy CFG at all. Since `Stages::AST::ref()` is `1 = 1`, removing it is result-preserving; it only changes stage scheduling. After this change the legacy CFG/ESSA is no longer materialised for queries that do not genuinely reference it. Verified on the full `python-security-extended` suite and on django: legacy CFG/ESSA families materialised drop from ~165 to 0 with byte-identical results. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Draft for measurement. This PR exists to run DCA and quantify how much of the shared-CFG dataflow flip's overhead is recovered by not computing the legacy CFG. It is based on the flip branch (
yoff/python-shared-cfg-dataflow-flip, #21925), so the diff is exactly the one-commit change and the DCA a/b isolates the effect of the unpin.What
The legacy CFG (
Flow.qll) and legacy ESSA (Essa/SsaCompute/SsaDefinitions) are pinned into the always-onStages::ASTcached stage viaStages::AST::ref()(11 sites) and the matchingStages::AST::backref()disjuncts.backref()is referenced nowhere and optimizes to1 = 1, so theref/backrefpattern only controls stage assignment. Because a cached stage is materialised as a unit once any of its predicates is demanded — and every query demands e.g.Expr.toString()— this forces the legacy CFG/ESSA to be computed for every query.After the shared-CFG dataflow flip, the security/dataflow queries no longer depend on the legacy CFG at all, so on the flip branch this computation is pure dead weight.
This PR removes those pins. Since
Stages::AST::ref()is1 = 1, the change is result-preserving — it only changes stage scheduling.Verification (result-preserving)
python-security-extendedsuite (52 queries) anddjango: legacy CFG/ESSA predicate families materialised drop from ~165 → 0, with byte-identical results.-j4); ~9.5s of legacy predicate self-time removed (≈1.3% of total analysis CPU).Question DCA should answer
The legacy CFG the security queries drag in (~9.5s CPU, 1.3%) is a fraction of the new CFG+SSA the flip adds (~37.6s CPU, 5.3%), so a-priori this recovers only ~a quarter of the new-CFG cost, not the full flip overhead. DCA across the target set will give the authoritative per-project number.
Caveats before this could merge