Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up test initialization #14784

Merged
merged 1 commit into from
Jan 11, 2025

Conversation

bziobrowski
Copy link
Contributor

@bziobrowski bziobrowski commented Jan 9, 2025

This PR is still a work in progress.

Profiling integration test startup revealed that a lot of time is actually spent waiting on ZK connections or sleeping.
This PR reduces test init time by about 5-6 seconds by:

  • reducing sleep times
  • closing ZK connections asynchronously
  • disabling slow getCloudConfig() check by mocking static method
  • disabling swagger in controller (can be overriden via startControllerWithSwagger() method)

Flame graph:
wall.html.tgz

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.84%. Comparing base (59551e4) to head (07e734d).
Report is 1557 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/controller/ControllerConf.java 25.00% 2 Missing and 1 partial ⚠️
.../java/org/apache/pinot/common/utils/ZkStarter.java 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14784      +/-   ##
============================================
+ Coverage     61.75%   63.84%   +2.09%     
- Complexity      207     1610    +1403     
============================================
  Files          2436     2704     +268     
  Lines        133233   150989   +17756     
  Branches      20636    23321    +2685     
============================================
+ Hits          82274    96404   +14130     
- Misses        44911    47371    +2460     
- Partials       6048     7214    +1166     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.82% <78.94%> (+2.11%) ⬆️
java-21 63.74% <73.68%> (+2.11%) ⬆️
skip-bytebuffers-false 63.84% <78.94%> (+2.09%) ⬆️
skip-bytebuffers-true 63.71% <73.68%> (+35.98%) ⬆️
temurin 63.84% <78.94%> (+2.09%) ⬆️
unittests 63.84% <78.94%> (+2.09%) ⬆️
unittests1 56.28% <0.00%> (+9.39%) ⬆️
unittests2 34.14% <78.94%> (+6.41%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

try (MockedStatic<HelixPropertyFactory> mock = Mockito.mockStatic(HelixPropertyFactory.class)) {

// mock helix method to disable slow, but useless, getCloudConfig() call
Mockito.when(HelixPropertyFactory.getCloudConfig(Mockito.anyString(), Mockito.anyString()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this breaks the purpose of integration test, where it should mimic the real cluster scenario.
How long does this call take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That call always returns an empty CloudConfig in tests.
Please have a look at the attached flame graph.

@@ -191,6 +192,7 @@ public void run() {
LOGGER.warn("Failed to connect to zk server.", e);
throw e;
}
Thread.sleep(50L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this sleep?
The CI machines running in Github Actions have very limited resource, so usually we prefer conditioned wait, instead of fixed time sleep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not . It's a replacement for the former up-front 1s sleep which might've served some purpose.

public static void closeAsync(ZkClient client) {
if (client != null) {
ZK_DISCONNECTOR.submit(() -> {
client.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this take if we run it in sync fashion?

Copy link
Contributor Author

@bziobrowski bziobrowski Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each close call isn't very slow but there are tens of such calls done during init.
Please have a look at the attached flame graph.

@Jackie-Jiang Jackie-Jiang merged commit 6eddacf into apache:master Jan 11, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants