You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At this point, we need access to the faces-gui Service in the faces namespace. Though I can access it directly via its external IP in my setup, I'll write this repro using kubectl port-forward.
Using --set backend.delayBuckets=1000 when installing Faces means that the backend workloads (smiley and color) will always delay every call 1000ms. We can verify this with curl:
This will take about a second, and its output will include a latency element that will be around 1000ms. We can check this a few times:
for i in 1 2 3 4 5 6 7 8 9 10; do
curl -s 'http://localhost:8080/face/center/?row=2&col=2' | jq .latency
done
This will take about 10 seconds, and should show a stack of numbers around 1000.
Go ahead and open a web browser to http://localhost:8080 and you'll see the Faces GUI, which we'll use as a traffic generator. Then run
watch linkerd viz stat-outbound -n faces deploy/face
and you'll see something like this (after possibly giving it a chance to warm up):
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
face color:80 [default] 100.00% 8.00 5500ms 9550ms 9910ms 0.00% 0.00%
└─────────────► color:80 100.00% 8.00 5500ms 9550ms 9910ms 0.00%
face smiley:80 [default] 100.00% 8.00 5500ms 9550ms 9910ms 0.00% 0.00%
└─────────────► smiley:80 100.00% 8.00 5500ms 9550ms 9910ms 0.00%
Ignore the 100% success rate for color (it's a gRPC service and we have no GRPCRoutes at present) and look at the latencies. All of these numbers should be right at 1000ms, but they're not?
Even more interesting, if we switch the backend latencies to 100ms:
kubectl set env -n faces deploy/smiley DELAY_BUCKETS=100
kubectl set env -n faces deploy/color DELAY_BUCKETS=100
kubectl rollout status -n faces deploy
then the for loop above will run it about one second, and show numbers right around 100, but linkerd viz stat-outbound -n faces deploy/face will (after it settles down) show things like this:
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
face smiley:80 [default] 100.00% 8.00 275ms 478ms 496ms 0.00% 0.00%
└─────────────► smiley:80 100.00% 8.00 175ms 242ms 248ms 0.00%
face color:80 [default] 100.00% 8.00 275ms 478ms 496ms 0.00% 0.00%
└─────────────► color:80 100.00% 8.00 175ms 242ms 248ms 0.00%
which is even weirder -- why the distinction between the different rows?
Logs, error output, etc
See above.
output of linkerd check -o short
:; linkerd check -o short
Status check results are √
Environment
MacOS 15.1.1
k3d version 5.7.4
k3s version 1.30.4-k3s1
Linkerd edge-24.11.8
Faces 2.0.0-rc.2 😇
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered:
In this case, the relevant latency histogram bucket bounds is [1s, 10s]. When the Linkerd proxy observes a latency that is slightly larger than 1s, it records it into this bucket where it becomes indistinguishable from any other value between 1s and 10s. Then, when performing latency quantile calculations, we would get a value like 1.5s for p50 (which, as you point out, is much larger than the actual latency value).
These types of histogram artifacts can always occur when recorded latency values are near a bucket boundary and the magnitude of the error depends on the size of the buckets. If Linkerd used finer grained buckets here, the magnitude of the error would be smaller but this would come at a tradeoff with memory and timeseries storage.
Additionally, this test exhibited "worst case" behavior because a latency of 1s was artificially chosen which happens to be right on a bucket boundary. Real world latencies are less likely to be clustered directly on bucket boundaries. Repeating this experiment with backend.delayBuckets="913" gives more accurate (i.e. less inaccurate) percentiles:
linkerd viz stat-outbound -n faces deploy/face
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
face color:80 [default] 100.00% 2.37 750ms 975ms 995ms 0.00% 0.00%
└─────────────► color:80 100.00% 2.37 750ms 975ms 995ms 0.00%
face smiley:80 [default] 100.00% 2.37 750ms 975ms 995ms 0.00% 0.00%
└─────────────► smiley:80 100.00% 2.37 750ms 975ms 995ms 0.00%
What is the issue?
linkerd viz stat-outbound
seems to report latencies that are many times larger than actual latencies.How can it be reproduced?
Get a new cluster, then
At this point, we need access to the
faces-gui
Service in thefaces
namespace. Though I can access it directly via its external IP in my setup, I'll write this repro usingkubectl port-forward
.Using
--set backend.delayBuckets=1000
when installing Faces means that the backend workloads (smiley
andcolor
) will always delay every call 1000ms. We can verify this withcurl
:This will take about a second, and its output will include a
latency
element that will be around 1000ms. We can check this a few times:This will take about 10 seconds, and should show a stack of numbers around 1000.
Go ahead and open a web browser to
http://localhost:8080
and you'll see the Faces GUI, which we'll use as a traffic generator. Then runand you'll see something like this (after possibly giving it a chance to warm up):
Ignore the 100% success rate for
color
(it's a gRPC service and we have no GRPCRoutes at present) and look at the latencies. All of these numbers should be right at 1000ms, but they're not?Even more interesting, if we switch the backend latencies to 100ms:
then the
for
loop above will run it about one second, and show numbers right around 100, butlinkerd viz stat-outbound -n faces deploy/face
will (after it settles down) show things like this:which is even weirder -- why the distinction between the different rows?
Logs, error output, etc
See above.
output of
linkerd check -o short
Environment
MacOS 15.1.1
k3d version 5.7.4
k3s version 1.30.4-k3s1
Linkerd edge-24.11.8
Faces 2.0.0-rc.2 😇
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: