-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomized allocation sampling #104955
Randomized allocation sampling #104955
Conversation
20be562
to
3a77982
Compare
62716a0
to
8272401
Compare
3551533
to
91b51be
Compare
Functional testing found and fixed an off-by-one error in the RNG code but otherwise things looked fine. I also resynced this PR on top of the latest changes in #104849 and #104851. The last commit, now number 10, remains the interesting one. I also did some performance testing using GCPerfSim as an allocation benchmark. My default configuration was 4 threads, workstation mode, 500GB of allocations entirely with small objects and no survival. It is intended to put maximum stress on the allocation code paths. GCPerfSim command line: EDIT: Don't rely on these numbers, they are misleading. See #104955 (comment) Benchmarks - No Tracing enabled
Benchmarks - Tracing AllocSampling+GC keywords, verbose level
Overall it looks like around ~0.9 additional seconds for the PR to do 500GB of allocations. On a tight microbenchmark its noticeable and then as other GC or non-allocation costs increase it becomes relatively less noticeable. I'm investigating to see if it can be improved at all. |
Continued perf investigation+testing has cast my previous results into doubt. After lots of searching for what could have caused the regression, my best explanation is that it actually had nothing to do with the source changes in this PR and instead it is either non-determinism in the build process or some user error on my part. I reached that conclusion by doing the following: I've had a folder on my machine C:\git\runtime3 that throughout the entire process has been synced here:
This folder has no changes from any of my PRs in it and I've been using the build here for all the baseline measurements. Then I executed the following changes:
I can consistently reproduce the same magnitude perf regression using the coreclr built in the artifacts directory, but the regression doesn't appear using the build in the backup or backup_2 directory. I've done many runs on each binary switching between them in a semi-randomized ordering trying to ensure that the results for each binary are repeatable and robust relative to background noise on the machine. Beyond that I've also got many other builds that include different subsets of the change but there is no clear relationship between the source and the perf results. During one period I progressively added functionality starting from the baseline without triggering the regression to occur, then during another period I was progressively removing functionality from the final PR and the regression would always occur. Even deleting the entirety of the source changes in that folder and syncing it back to the baseline didn't eliminate the perf overhead. Every build was done in a new folder starting without an artifacts folder to remove opportunity for incremental build problems to play a role. The only explanations I have that make sense to me are either: (a) non-deterministic builds are giving bi-modal perf results for the same input source code or (b) I am making some other error in my testing methodology repeatedly I'm going to see if I can get another machine to repeat some of the original experiments but at the moment I no longer have any evidence the PR is causing a regression. |
@jkotas @MichalStrehovsky - Functional and perf testing both looked good now, all outstanding comments on the PRs have been addressed, and CI is green. From my perspective this is ready to be merged unless any further review is planned? I could check in #104849, #104851, then this PR in sequence but I'm not sure that gives any advantage over just checking in this PR alone and closing 104849 and 104851 as no longer needed. |
Sounds good to me: I never found a usage for it :^)
I'll run my tests on monday - the rest of the week is off for me sorry |
Sounds great! Its not a rush :) |
/ba-g remaining issues are deadletters for which an issue isn't appropriate |
@jkotas - Thanks for the earlier feedback. I think that has been covered now but let me know if you have any other feedback. Thanks! |
/ba-g remaining issues are deadletters for which an issue isn't appropriate |
...ts/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs
Outdated
Show resolved
Hide resolved
I do not have any additional feedback. As I have mentioned in #104955 (review), this should get a review from somebody familiar with tracing. |
@brianrob - would you mind taking a look if you haven't already? |
I'm back from a conference in Belgium and I'm starting to work on it. |
Co-authored-by: Adeel Mujahid <[email protected]>
Thanks all! As a heads up I'll be out for extended period starting the week after next. I'm hoping that review will be finished this week and we can merge this. That gives me all of next week to respond if there is any unexpected post-merge regression discovered. I think we've got a reasonable amount of testing that the feature is working correctly already, but if you did find something @chrisnas in your additional testing we'd still have a long runway to get it fixed before .NET 10 ships next year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @noahfalk!
/ba-g Still getting deadletter issues on Chrome debugger tests, but it appears to be unrelated and pre-existing |
I'm seeing the newly added test failing in native AOT outerloop in #109842.
|
* Add Randomized Allocation Sampling This feature allows profilers to do allocation profiling based off randomized samples. It has better theoretical and empirically observed accuracy than our current allocation profiling approaches while also maintaining low performance overhead. It is designed for use in production profiling scenarios. For more information about usage and implementation, see the included doc docs/design/features/RandomizedAllocationSampling.md Much of this code was originally written in dotnet#100356 by @chrisnas, but for logistical reasons we superceded it with this new PR.
This feature allows profilers to do allocation profiling based off randomized samples. It has better theoretical and empirically observed accuracy than our current allocation profiling approaches while also maintaining low performance overhead. It is designed for use in production profiling scenarios. For more information about usage and implementation, see the included doc docs/design/features/RandomizedAllocationSampling.md
[Updated 10/30]
This PR supersedes #100356. Currently it includes 2 commits, the first has the (mostly) rebased changes from July and the 2nd addressed a few things I missed in the rebase and followed up on issues flagged in July.
Most of the testing happened during July but I did re-run some of the functional tests as a smoke test to ensure nothing broke in the few recent changes. We'll also get updated CI regression testing and I'm coordinating with @chrisnas to have him validate it works in a real profiler.