-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add no-op support for collector lambda layer #1181
Comments
I am not sure whether it is the correct approach to switch to noop mode when configuration in valid. Because it might be confusing for the users and as far as I know it doesn't align with the way of how OTEL configurations are handled. Instead of noop, default values might be used for the invalid configs and fail fast if there is no default value for the invalid config. WDYT @tylerbenson? |
Adding some more context--it's reasonable for otelcol outside of Lambda to fail fast on invalid config, the only consequence is the collector doesn't run but it doesn't bring down the entire host. But in Lambda, the otelcol extension failing means the entire Lambda runtime crashes, kind of like crashing the entire VM because otelcol didn't start. To me this is pretty terrible user experience. |
I can see both arguments here, though I'm leaning towards fail fast being the better option. Might be worth discussing in the SIG meeting. Lambda versions are generally immutable, so it's nice to know immediately if you configured something wrong. If a deployment is urgent, the rollback can be as easy as removing the collector layer and redeploying. |
BTW, I am really not sure whether entire Lambda environment crashes if/when an extension fails gracefully (by calling |
And also AWS Lambda encourages being fail fast for extensions: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html#runtimes-extensions-init-error |
That's good to know re:
Still, the end result is the application is unavailable, and I do think it's pretty disruptive even given the recourse available. It goes against the expectation that observability tools strive to cause as little disruption to the application as possible. |
I still prefer fail fast when something is not configured properly. Pre-prod environments are there to catch such cases before happening in productions. If it is silently ignored (even though there are error logs), I am pretty sure that most of the people and companies will not notice it until they find out that they have missing traces after some time. I agree that both of the approaches have their own pros and cons, but IMO, being aware of the issues earlier is more important than suppressing them. |
Is your feature request related to a problem? Please describe.
If
Config.Validate()
of a component returns false, the collector lambda layer cannot start in AWS lambda. As a result, the user lambda function is broken.Describe the solution you'd like
Depending on the component, an invalid component configuration may not need to fail the whole collector lambda layer. We could let that component run in
no-op
.Describe alternatives you've considered
Tried removing all config validation logic in the component and moved them to
Start
function. If config is invalid, just print a message instead. However, opentelemetry-collector-contrib code reviewer would like to check if there is other way to go.Additional context
PR review comment
The component PR
The text was updated successfully, but these errors were encountered: