Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare local and weekly benchmarks using Hatchet #1317

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from

Conversation

chapman39
Copy link
Collaborator

@chapman39 chapman39 commented Jan 23, 2025

  • create script to compare benchmarks of a local build and weekly shared benchmarks on LC
  • improve handling of cmake build type in config-build.py
  • create optional, manual CI pipeline (ruby-gcc, ruby-clang, lassen-clang) to test current PR
  • documentation on how to use this script and run the manual CI pipeline

tmp todo:

  • run benchmarks in a separate pipeline and have them all point at one compare call (since lassen cannot run hatchet)

How the script works

The script matches two caliper files (one from weekly shared location /usr/workspace/smithdev/califiles/serac, one from a specified build location), and creates a Hatchet "graph frame" from the difference between these two files. If the maximum difference between any section of the graph is greater than X seconds, that benchmark will "fail." The script will do this for each benchmark.

Example

../scripts/llnl/compare_benchmarks.py --current-cali-dir . --verbose --depth 2 --metric-columns "Max time/rank (inc)"

(not all graphs are shown)
Screenshot 2025-01-29 at 1 40 12 PM

You can now see the baseline and current benchmark times, as well as the difference between the two. You can also choose which "metric column" you want to see (defaults to average time per rank) and set the level of depth of the tree you wish to view. At the moment, it only displays the difference trees.

Some problems

LC system performance is inconsistent. You can run the same benchmark multiple times and get wildly different results. My understanding is this is due to the node(s) you get allocated, how busy the machine is, among other things. That being said, while this is a nice feature to look at, I'm skeptical to make this a required CI check at this time.

Improving config-build.py

Before this PR, if you set -DCMAKE_BUILD_TYPE=Release when configuring Serac, the build directory will incorrectly have debug in the name, since the args.buildtype variable remained Debug. This PR updates args.buildtype based on -DCMAKE_BUILD_TYPE, if set - and assuming --build-type option hasn't been set to anything else.

Links

@chapman39 chapman39 added CI Continuous Integration testing Related to testing labels Jan 23, 2025
@chapman39 chapman39 self-assigned this Jan 23, 2025
@chapman39 chapman39 mentioned this pull request Jan 27, 2025
10 tasks
@chapman39 chapman39 marked this pull request as ready for review January 28, 2025 01:31
@chapman39
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration testing Related to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant