-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracing] received corrupt message of type InvalidContentType - When Collector is not in mesh, or has ports marked inbound skip #13427
Comments
Is there any additional information which would make this easier to troubleshoot? |
We ran into the same error message. On a hunch, I thought it might be because the collector was not meshed. After meshing our collector, this error went away. |
When we mesh the collector, linkerd traces show as being generated BY the
collector that processes the trace, so it creates a brand new issue.
…On Thu, Jan 9, 2025, 7:42 AM Weichung Shaw ***@***.***> wrote:
We ran into the same error message. On a hunch, I thought it might be
because the collector was not meshed. After meshing our collector, this
error went away.
—
Reply to this email directly, view it on GitHub
<#13427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFBNZZAIWLDC5RVU5N6H3T2JZVEBAVCNFSM6AAAAABTAUB3GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBQGA3DEMRWGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Huh, what do you mean it's shown as being emitted by the collector? For what it's worth, we now have it working correctly in our cluster, properly correlated with all our other spans. |
@wc-s I would love to hear how exactly you've set things up to have everything working – I'm not an Alloy expert and would like to learn more. 🙂 |
I mean all linkerd traces contain metadata of the grafana alloy pod which
received the trace, which is why we attempted to unmesh the collector
…On Thu, Jan 9, 2025, 12:07 PM Weichung Shaw ***@***.***> wrote:
Huh, what do you mean it's shown as being emitted by the collector?
For what it's worth, we now have it working correctly in our cluster,
properly correlated with all our other spans.
—
Reply to this email directly, view it on GitHub
<#13427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFBNZ3FAREEPDP6DLJMIO32J2UEPAVCNFSM6AAAAABTAUB3GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBQHAZTMMRSGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'm boarding a flight, but there are open issues on grafana alloy, linkerd
and the hotel collector all referencing similar issues.
Trace ports are marked opaque but it doesn't change the result
…On Thu, Jan 9, 2025, 12:08 PM Flynn ***@***.***> wrote:
@wc-s <https://github.com/wc-s> I would love to hear how exactly you've
set things up to have everything working – I'm not an Alloy expert and
would like to learn more. 🙂
—
Reply to this email directly, view it on GitHub
<#13427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFBNZ64HNQS4Y3XD63DJO32J2UKNAVCNFSM6AAAAABTAUB3GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBQHAZTSMJUGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
OK, I gotta apologize, I didn't read the original post carefully enough and didn't realize you're using Grafana Alloy. We're using the upstream opentelemetry-collector directly. What's more, we don't use the k8sattributes processor and instead populate the k8s metadata some other way, so our experience is largely irrelevant to you 😅 However, examining your Alloy config, have you tried applying linkerd people's recommended collector config? It's here: https://github.com/linkerd/linkerd2/blob/main/jaeger/charts/linkerd-jaeger/values.yaml#L120 Grafana Alloy I think is just a thin wrapper around opentelemetry-collector, so the config format is pretty much the same. I suspect that the linkerd people also realized that here we cannot rely on Pod IP to associate, so there they used the Firstly they rename the Then they tell collector to use that attribute to find the right pod: https://github.com/linkerd/linkerd2/blob/main/jaeger/charts/linkerd-jaeger/values.yaml#L124 And they run the Hopefully that makes the Collector find the correct Pod. If the above doesn't work though, for what it's worth, you can probably get like half of the data you want without even using the k8sattribute processor. linkerd-proxy populates the following attributes:
Which gets you the workload name, the workload namespace, and pod name. The first two though, are overriden by your current k8sattributes config:
And by removing that, you'd be able to see those two. But yea this still doesn't get you the node.name, pod.uid, pod.start_time, and the other labels and annotations. |
What is the issue?
Linkerd breaks traces when running against the OTLP collector, confusing the collector into thinking the traces come from the collector itself, not from the originating pod.
Example: grafana/alloy#1336 (comment)
As a work around, we wanted to just remove the collector from the mesh, that breaks linkerd-proxy being able to send traces. We then attempted to leave the collector in the mesh, but tell it to skip the relevant ports inbound, that also breaks linkerd-proxy being able to send traffic.
How can it be reproduced?
Logs, error output, etc
This happens with Alloy removed from the mesh, or with alloy set to skip the ports in bound.
output of
linkerd check -o short
Environment
Kubernetes version - 1.30
Cluster Environment - EKS
Host OS - Bottle Rocket
Possible solution
No response
Additional context
Im not sure why port 4318, ever shows up in the logs. Its configured for 4317
The pod it's failing to connect to, is an alloy pod.
The linkerd proxy, on the alloy pod logs this
There are no actual errors on the actual alloy pod itself.
Everything that is not Linkerd-proxy, is still able to send traces without a problem, including pods that are fully in the mesh themselves.
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: