-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: EDGE3D update #478
base: develop
Are you sure you want to change the base?
WIP: EDGE3D update #478
Conversation
32723a7
to
702238f
Compare
@pguthrey is this ready for review? |
Not just yet. Need to experiment more with different implementations. Will come back to this later. |
050da7e
to
c9b19a6
Compare
Here are the results of these changes. Good improvement for CUDA. Impossibly good improvement for HIP. I checked that the results are the same as the previous algorithm... but I might look more into what is going on with HIP.
|
Perhaps there could still be register spilling with cuda or something like that that is making a dramatic difference. We'll have to look at the instructions to see what happened. |
That makes some sense. If I add the memory needed by the vectors and the matrix together I get
|
Updated results after fixing a bug.
|
It actually made cuda slower? |
const rajaperf::Real_type detj_tol, | ||
const rajaperf::Int_type quad_type, | ||
const rajaperf::Int_type quad_order, | ||
rajaperf::Real_type (&matrix)[EB][EB]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a full matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface that this is used for requires having a full matrix. However, when we we are computing the work at each quadrature point, we are using a symmetric matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can follow up with a study on how beneficial it would be to never create the full matrix. If that is a major impact, that may be enough incentive to rewrite how things are done in the ultimate use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick study measuring the performance of an impl that has no 12x12 matrices. Its indistinguishable from the performance seen in this MR.
I would not say that is a statistically significant difference. (could be within sampling tolerance) |
|
Summary