-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp the generation of runtime division checks on ARM64 #111543
base: main
Are you sure you want to change the base?
Conversation
This is WIP. I've taken a different approach to adding new nodes, instead adding a pass that modifies the HIR. The pass will run through all of the code in the function looking for The added HIR looks like this for the signed overflow check, for example. This is checking for
Here's the example @kunalspathak mentioned in #64795:
Before the change:
After the change:
The main difference is at label IG04, rather than a fixed sequence of compare and branch instructions chosen at the emit stage, the compiler has decided to build a logical expression for the overflow check and emit a The approach is working well when: It seems to have an adverse effect on MinOpts though, because splitting the tree will often spill and there aren't any optimization passes running to clear up these spills. At the moment I haven't focused on the efficiency of the pass itself but I believe it could be improved. I could borrow the recursive traversals in the earlier morph phase to build a work-list for where checks need to be added. Then the pass can be linear over a pre-built list of nodes rather than a search in a loop. I would just have to be careful to update all of the locations of the nodes after any trees are split, but I think this should be possible. I've also had to make a temporary fix on a problem with the tree splitting code where it wasn't correctly updating the node flags after splitting out side effects. After splitting the tree I traverse it post-order to update all of the flags. There might be a more efficient way of doing this. |
I think the build is failing on Release mode due to use of |
What do you need this for? Increasing the size of |
I need a way of uniquely identifying a
I think many of the regressions are caused by spilling in My options for continuing this are:
I think it's sensible to allow the compiler to have a view of all of this code being added early on in the pipeline, but this might not make sense in the tiering model. So option 2 could be a good compromise. I'll need to look into individual regression cases for both options regardless of choice. I would be grateful for any opinions on this and the approach in general. |
src/coreclr/jit/arithchecks.cpp
Outdated
impCloneExpr(divisor, &divisorCopy, CHECK_SPILL_NONE, nullptr DEBUGARG("cloned for runtime check")); | ||
impCloneExpr(dividend, ÷ndCopy, CHECK_SPILL_NONE, nullptr DEBUGARG("cloned for runtime check")); | ||
|
||
// (dividend < 0 && divisor == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be dividend == MinValue
and divisor == -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I misinterpreted the exception case and this likely explains some of the issues I'm seeing. Thanks.
src/coreclr/jit/arithchecks.cpp
Outdated
code.block = divBlock; | ||
code.stmt = divBlock->firstStmt(); | ||
code.tree = tree; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like something here should be setting GTF_DIV_MOD_NO_BY_ZERO
and GTF_DIV_MOD_NO_OVERFLOW
on the DIV
node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm relying on the settings of these flags in morph stage, e.g. in fgMorphSmpOp
(morph.cpp:8584). When I call GenTree::OperExceptions
I'm implicitly checking these flags too. As I'm not doing any further processing of the types of the operands I don't think I can make that decision in this pass, unless I'm missing something here?
Yes, the address of nodes can be used, see e.g. However, most likely there is no need for any form of "visited" check at all; instead you can shape your pass so that it visits all IR exactly once. See e.g. the various helper expansions in helperexpansion.cpp; those are shaped so that they visit all IR once while allowing for expansion of internal nodes into control flow.
I agree that (2) would be most reasonable. You may want to experiment with some alternative simpler and cheaper ways of accomplishing what this pass is doing. One thing that comes to mind is expanding the checks as QMARKs during import. That is, instead of creating a QMARK(dividend == MinValue & divisor == -1, CALL CORINFO_HELP_OVERFLOW, QMARK(divisor == 0, GCALL CORINFO_HELP_THROWDIVBYZERO, DIV dividend, divisor)) (marking the division node with |
Fixes dotnet#64795 This patch wraps GT_DIV/GT_UDIV nodes on integral types with GT_QMARK trees that contain the necessary conformance checks (overflow/divide-by-zero) when compiling the code with FullOpts enabled. Currently these checks are added during the Emit phase, this is still the case for MinOpts. The aim is to allow the compiler to make decisions on code position and instruction selection for these checks. For example on ARM64 this enables certain scenarios to choose the cbz instruction over cmp/beq, can lead to more compact code. It also allows some of the comparisons in the checks to be hoisted out of loops.
e22295b
to
f8928ff
Compare
Hi @jakobbotsch, Thanks for the help. I've updated the pull request now with the implementation introducing It's still not fully clear from the diffs exactly how much impact is caused by the It's much clearer now where the spills are occurring now, quite a few are related to trying to clone a I'll also need to think about a way of making sure the |
Fixes #64795
This patch introduces a new compilation phase that passes over the
GenTrees
looking forGT_DIV/GT_UDIV
nodes on integral types, and morphs the code to introduce the necessary conformance checks (overflow/divide-by-zero) early on in the compilation pipeline. Currently these are added during the Emit phase, meaning optimizations don't run on any code introduced.The aim is to allow the compiler to make decisions on code position and instruction selection for these checks. For example on ARM64 this enables certain scenarios to choose the
cbz
instruction overcmp/beq
, can lead to more compact code. It also allows some of the comparisons in the checks to be hoisted out of loops.