-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retry for OkHttpGrpcExporter #3936
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3936 +/- ##
============================================
+ Coverage 89.71% 89.79% +0.08%
- Complexity 4170 4182 +12
============================================
Files 498 499 +1
Lines 12661 12702 +41
Branches 1223 1227 +4
============================================
+ Hits 11359 11406 +47
+ Misses 908 904 -4
+ Partials 394 392 -2
Continue to review full report at Codecov.
|
3f7dd41
to
9032b6a
Compare
9032b6a
to
fc93256
Compare
…nto okhttp-grpc-retrier
FYI, I've been working on something similar for the okhttp http/protobuf otlp exporters. |
...lp/common/src/main/java/io/opentelemetry/exporter/otlp/internal/grpc/OkHttpGrpcExporter.java
Outdated
Show resolved
Hide resolved
...lp/common/src/main/java/io/opentelemetry/exporter/otlp/internal/grpc/OkHttpGrpcExporter.java
Outdated
Show resolved
Hide resolved
.../otlp/common/src/main/java/io/opentelemetry/exporter/otlp/internal/okhttp/OkHttpRetrier.java
Outdated
Show resolved
Hide resolved
.../otlp/common/src/main/java/io/opentelemetry/exporter/otlp/internal/okhttp/OkHttpRetrier.java
Outdated
Show resolved
Hide resolved
sleeper.sleep(backoffNanos); | ||
} catch (InterruptedException e) { | ||
Thread.currentThread().interrupt(); | ||
return response; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to return here or raise a runtime exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand calling interrupt and returning is the correct way of handling an interrupt exception, though I've never really run into it differing much from throwing in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think in practice the thread will only be interrupted at shutdown so it shouldn't matter much that we return.
.../otlp/common/src/main/java/io/opentelemetry/exporter/otlp/internal/okhttp/OkHttpRetrier.java
Outdated
Show resolved
Hide resolved
491bc52
to
58b3419
Compare
@@ -229,6 +226,18 @@ public CompletableResultCode shutdown() { | |||
return CompletableResultCode.ofSuccess(); | |||
} | |||
|
|||
static boolean isRetryable(Response response) { | |||
// Only retry on gRPC codes which will always come with an HTTP success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jack-berg BTW, this is unlikely to be the best behavior (i.e., 503 seems like an obvious one to retry on) - gRPC retry lives in a bubble that misses standard HTTP load balancers. But since spec doesn't mention HTTP codes for gRPC retry, went with this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point. There could be a load balancer in front of the gRPC server that gets in the way of the gRPC bubble.
// Careful - the document says "retry attempt" but our maxAttempts includes the original | ||
// request. The numbers are tricky but the unit test should be more clear. | ||
long upperBoundNanos = Math.min(nextBackoffNanos, retryPolicy.getMaxBackoff().toNanos()); | ||
long backoffNanos = randomLong.get(upperBoundNanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can return any number like from 0 to upperBoundNanos, and in worst case all the retries can go off in like 1 or two nanos.
As per the doc attached as comment, the jitter should be 0.2, this doesn't operate that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks like the link was updated since this code was written, probably good to match the code up with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn can you open an issue for this, so your suggestion isn't lost? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I created one: #7004
No description provided.