Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retry for OkHttpGrpcExporter #3936

Merged
merged 6 commits into from
Dec 1, 2021

Conversation

anuraaga
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Nov 25, 2021

Codecov Report

Merging #3936 (7f793f6) into main (42e72fe) will increase coverage by 0.08%.
The diff coverage is 95.91%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #3936      +/-   ##
============================================
+ Coverage     89.71%   89.79%   +0.08%     
- Complexity     4170     4182      +12     
============================================
  Files           498      499       +1     
  Lines         12661    12702      +41     
  Branches       1223     1227       +4     
============================================
+ Hits          11359    11406      +47     
+ Misses          908      904       -4     
+ Partials        394      392       -2     
Impacted Files Coverage Δ
...xporter/otlp/internal/grpc/OkHttpGrpcExporter.java 82.47% <66.66%> (-1.40%) ⬇️
...ry/exporter/otlp/internal/grpc/GrpcStatusUtil.java 100.00% <100.00%> (ø)
...xporter/otlp/internal/grpc/ManagedChannelUtil.java 86.27% <100.00%> (-1.00%) ⬇️
.../otlp/internal/grpc/OkHttpGrpcExporterBuilder.java 88.46% <100.00%> (+2.74%) ⬆️
...xporter/otlp/internal/okhttp/RetryInterceptor.java 100.00% <100.00%> (ø)
...io/opentelemetry/sdk/metrics/SdkMeterProvider.java 79.66% <0.00%> (-3.39%) ⬇️
...ntelemetry/sdk/extension/resources/OsResource.java 90.69% <0.00%> (+4.65%) ⬆️
...ava/io/opentelemetry/sdk/internal/RateLimiter.java 100.00% <0.00%> (+5.88%) ⬆️
...metry/sdk/extension/resources/ProcessResource.java 87.50% <0.00%> (+6.25%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42e72fe...7f793f6. Read the comment docs.

@anuraaga anuraaga force-pushed the okhttp-grpc-retrier branch 5 times, most recently from 3f7dd41 to 9032b6a Compare November 25, 2021 09:08
@anuraaga anuraaga force-pushed the okhttp-grpc-retrier branch from 9032b6a to fc93256 Compare November 25, 2021 09:16
@jack-berg
Copy link
Member

FYI, I've been working on something similar for the okhttp http/protobuf otlp exporters.

sleeper.sleep(backoffNanos);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return response;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return here or raise a runtime exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand calling interrupt and returning is the correct way of handling an interrupt exception, though I've never really run into it differing much from throwing in practice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think in practice the thread will only be interrupted at shutdown so it shouldn't matter much that we return.

@anuraaga anuraaga force-pushed the okhttp-grpc-retrier branch from 491bc52 to 58b3419 Compare November 29, 2021 23:51
@@ -229,6 +226,18 @@ public CompletableResultCode shutdown() {
return CompletableResultCode.ofSuccess();
}

static boolean isRetryable(Response response) {
// Only retry on gRPC codes which will always come with an HTTP success
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg BTW, this is unlikely to be the best behavior (i.e., 503 seems like an obvious one to retry on) - gRPC retry lives in a bubble that misses standard HTTP load balancers. But since spec doesn't mention HTTP codes for gRPC retry, went with this for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point. There could be a load balancer in front of the gRPC server that gets in the way of the gRPC bubble.

@anuraaga anuraaga merged commit ae3fe66 into open-telemetry:main Dec 1, 2021
This was referenced Dec 19, 2021
// Careful - the document says "retry attempt" but our maxAttempts includes the original
// request. The numbers are tricky but the unit test should be more clear.
long upperBoundNanos = Math.min(nextBackoffNanos, retryPolicy.getMaxBackoff().toNanos());
long backoffNanos = randomLong.get(upperBoundNanos);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return any number like from 0 to upperBoundNanos, and in worst case all the retries can go off in like 1 or two nanos.

As per the doc attached as comment, the jitter should be 0.2, this doesn't operate that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks like the link was updated since this code was written, probably good to match the code up with it

grpc/proposal#452

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushtkn can you open an issue for this, so your suggestion isn't lost? thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I created one: #7004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants