This guide provides steps for new users, transitioning users, and those maintaining previous operator CRD configurations:
- New users: No migration for CRDs is required.
- Previous users: Migration may be needed if using
operator.enabled=true
.
CRD deployment has evolved over chart versions:
- Before 0.110.0: CRDs were deployed via a crds/ directory (upstream default).
- 0.110.0 to 1.113.0: CRDs were deployed using Helm templates (upstream default), which had reported issues.
- 0.116.0 and later: Users must now explicitly configure their preferred CRD deployment method or deploy the CRDs manually to avoid potential issues. Users can deploy CRDs via a crds/ directory again by enabling a newly added value.
New users are advised to deploy CRDs via the crds/
directory. For a fresh installation, use the following Helm values:
operatorcrds:
install: true
operator:
enabled: true
To install the chart:
helm install <release-name> splunk-otel-collector-chart/splunk-otel-collector --set operatorcrds.install=true,operator.enabled=true <extra_args>
If you're using chart versions 0.110.0 to 1.113.0, CRDs are likely deployed via Helm templates. To migrate to the recommended crds/
directory deployment:
Remove the chart to prepare for a fresh installation:
helm delete <release-name>
Check if the following CRDs are present and delete them if necessary:
kubectl get crds | grep opentelemetry
kubectl delete crd opentelemetrycollectors.opentelemetry.io
kubectl delete crd opampbridges.opentelemetry.io
kubectl delete crd instrumentations.opentelemetry.io
Reinstall the chart with the updated configuration:
helm install <release-name> splunk-otel-collector --set operatorcrds.install=true,operator.enabled=true <extra_args>
If you're using chart versions 0.110.0 to 1.113.0 and prefer to continue deploying CRDs via Helm templates (not recommended), you can do so with the following values:
operator:
enabled: true
operator:
crds:
create: true
Warning: This method may cause race conditions during installation or upgrades, leading to errors like:
ERROR: INSTALLATION FAILED: failed post-install: warning: Hook post-install splunk-otel-collector/templates/operator/instrumentation.yaml failed: 1 error occurred:
* Internal error occurred: failed calling webhook "minstrumentation.kb.io": failed to call webhook: Post "https://splunk-otel-collector-operator-webhook.default.svc:443/mutate-opentelemetry-io-v1alpha1-instrumentation?timeout=10s": dial tcp X.X.X.X:443: connect: connection refused
We've simplified the Helm chart configuration for operator
auto-instrumentation.
The values previously under .Values.operator.instrumentation.spec.*
have been moved to .Values.instrumentation.*
.
- No Action Needed: If you have no customizations under
.Values.operator.instrumentation.spec.*
, no migration is required. - Action Required: Continuing to use the old values path will result in a Helm install or upgrade error, blocking the process.
Migration Steps:
- Find any references to
.Values.operator.instrumentation.spec.*
in your Helm values with custom values. - Migrate them from
.Values.operator.instrumentation.spec.*
to.Values.instrumentation.*
.
Example Migration:
Before (Deprecated Path):
operator:
instrumentation:
spec:
endpoint: XXX
...
After (Updated Path):
instrumentation:
endpoint: XXX
...
The Java instrumentation
for Operator auto-instrumentation has been upgraded from v1.32.2 to v2.7.0.
This major update introduces several breaking changes. Below we have supplied a customer migration
guide and outlined the key changes to highlight the impact.
Please refer to the Migration guide for OpenTelemetry Java 2.x to update your custom dashboards, detectors, or alerts using Java application telemetry data.
- Runtime metrics will now be enabled by default, this can increase the number of metrics collected.
- The default protocol changed from gRPC to http/protobuf. For custom Java exporter endpoint configurations, verify that you’re sending data to http/protobuf endpoints like this example.
- Span Attribute Name Changes:
Old Attribute (1.x) | New Attribute (2.x) |
---|---|
http.method | http.request.method |
http.status_code | http.response.status_code |
http.request_content_length | http.request.body.size |
http.response_content_length | http.response.body.size |
http.target | url.path and url.query |
http.scheme | url.scheme |
http.client_ip | client.address |
- Metric Name Changes:
Old Metric (1.x) | New Metric (2.x) |
---|---|
db.pool.connections.create_time | db.client.connections.create_time (Histogram, ms) |
db.pool.connections.idle.max | db.client.connections.idle.max |
db.pool.connections.idle.min | db.client.connections.idle.min |
db.pool.connections.max | db.client.connections.max |
db.pool.connections.pending_threads | db.client.connections.pending_requests |
db.pool.connections.timeouts | db.client.connections.timeouts |
db.pool.connections.idle | db.client.connections.usage[state=idle] |
db.pool.connections.active | db.client.connections.usage[state=used] |
db.pool.connections.use_time | db.client.connections.use_time (Histogram, ms) |
db.pool.connections.wait_time | db.client.connections.wait_time (Histogram, ms) |
runtime.jvm.buffer.count | jvm.buffer.count |
runtime.jvm.buffer.total.capacity | jvm.buffer.memory.limit |
runtime.jvm.buffer.memory.used | jvm.buffer.memory.usage |
runtime.jvm.classes.loaded | jvm.class.count |
runtime.jvm.classes.unloaded | jvm.class.unloaded |
runtime.jvm.gc.concurrent.phase.time | jvm.gc.duration (Histogram, ) |
runtime.jvm.gc.pause | jvm.gc.duration () |
runtime.jvm.gc.memory.allocated | process.runtime.jvm.memory.allocated | jvm.memory.allocated* |
runtime.jvm.memory.committed | jvm.memory.committed |
runtime.jvm.memory.max | jvm.memory.limit |
runtime.jvm.gc.max.data.size | jvm.memory.limit{jvm.memory.pool.name=} |
runtime.jvm.memory.used | jvm.memory.used |
runtime.jvm.gc.live.data.size | jvm.memory.used_after_last_gc{jvm.memory.pool.name=} |
runtime.jvm.threads.daemon | runtime.jvm.threads.live | jvm.thread.count |
- Dropped Metrics:
- executor.tasks.completed
- executor.tasks.submitted
- executor.threads
- executor.threads.active
- executor.threads.core
- executor.threads.idle
- executor.threads.max
- runtime.jvm.memory.usage.after.gc
- runtime.jvm.gc.memory.promoted
- runtime.jvm.gc.overhead
- runtime.jvm.threads.peak
- runtime.jvm.threads.states
The networkExplorer
option is removed.
The networkExplorer
option is deprecated now. Please use the upstream OpenTelemetry eBPF Helm chart to collect
the network metrics by following the next steps:
- Make sure the Splunk OpenTelemetry Collector helm chart is installed with the gateway enabled:
gateway:
enabled: true
- Disable the network explorer:
networkExplorer:
enabled: false
- Grab name of the Splunk OpenTelemetry Collector gateway service:
kubectl get svc | grep splunk-otel-collector-gateway
- Install the upstream OpenTelemetry eBPF helm chart pointing to the Splunk OpenTelemetry Collector gateway service:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update open-telemetry
helm install my-opentelemetry-ebpf -f ./otel-ebpf-values.yaml open-telemetry/opentelemetry-ebpf
otel-ebpf-values.yaml
must at least have endpoint.address
option set to the Splunk OpenTelemetry
Collector gateway service name captured in the step 2. Additionally, if you had any custom confgurations in the
networkExplorer
section, you need to move them to the otel-ebpf-values.yaml
file.
endpoint:
address: <my-splunk-otel-collector-gateway>
# additional custom configuration moved from the networkExplorer section in Splunk OpenTelemetry Collector helm chart.
The default logs collection engine (logsEngine
) changed from fluentd
to the native OpenTelemetry logs collection (otel
).
If you want to keep using Fluentd sidecar for the logs collection, set logsEngine: fluentd
in your values.yaml.
The format for defining auto-instrumentation images has been refactored. Previously, the image was
defined using the operator.instrumentation.spec.{library}.image
format. This has been changed to
separate the repository and tag into two distinct fields: operator.instrumentation.spec.{library}.repository
and operator.instrumentation.spec.{library}.tag
.
If you were defining a custom image under operator.instrumentation.spec.{library}.image
, update
your values.yaml
to accommodate this change.
- Before:
operator:
instrumentation:
spec:
java:
image: ghcr.io/custom-owner/splunk-otel-java/custom-splunk-otel-java:v1.27.0
- After:
operator:
instrumentation:
spec:
java:
repository: ghcr.io/custom-owner/splunk-otel-java/custom-splunk-otel-java
tag: v1.27.0
There is a new receiver: Kubernetes Objects Receiver that can pull or watch any object from Kubernetes API server. It will replace the Kubernetes Events Receiver in the future.
To migrate from Kubernetes Events Receiver to Kubernetes Object Receiver, configure clusterReceiver
values.yaml section with:
k8sObjects:
- mode: watch
name: events
There are differences in the log record formatting between the previous k8s_events
receiver and the now adopted k8sobjects
receiver results.
The k8s_events
receiver stores event messages their log body, with the following fields added as attributes:
k8s.object.kind
k8s.object.name
k8s.object.uid
k8s.object.fieldpath
k8s.object.api_version
k8s.object.resource_version
k8s.event.reason
k8s.event.action
k8s.event.start_time
k8s.event.name
k8s.event.uid
k8s.namespace.name
Now with the k8sobjects
receiver, the whole payload is stored in the log body and object.message
refers to the event message.
You can monitor more Kubernetes objects configuring by clusterReceiver.k8sObjects
according to the instructions from the
Kubernetes Objects Receiver documentation.
Remember to define rbac.customRules
when needed. For example, when configuring:
objectsEnabled: true
k8sObjects:
- name: events
mode: watch
group: events.k8s.io
namespaces: [default]
You should add events.k8s.io
API group to the rbac.customRules
:
rbac:
customRules:
- apiGroups:
- "events.k8s.io"
resources:
- events
verbs:
- get
- list
- watch
[receiver/filelogreceiver] Datatype for force_flush_period
and poll_interval
were changed from map to string.
If you are using custom filelog receiver plugin, you need to change the config from:
filelog:
poll_interval:
duration: 200ms
force_flush_period:
duration: "0"
to:
filelog:
poll_interval: 200ms
force_flush_period: "0"
[receiver/filelogreceiver] Datatype for force_flush_period
and poll_interval
were changed from
sring to map. Because of that, the default values in Helm Chart were causing problems #519
If you are using custom filelog receiver plugin, you need to change the config from:
filelog:
poll_interval: 200ms
force_flush_period: "0"
to:
filelog:
poll_interval:
duration: 200ms
force_flush_period:
duration: "0"
If you are disabling this feature gate to keep previous functionality, you will have to complete the steps in upgrade guidelines 0.47.0 to 0.47.1 to upgrade since the feature gate no longer exists.
OTel Kubernetes receiver is now used for events collection instead of Signalfx events receiver
Before this change, if clusterReceiver.k8sEventsEnabled=true
, Kubernetes events used to be collected by a Signalfx
receiver and sent both to Splunk Observability Infrastructure Monitoring and Splunk Observability Log Observer.
Now we utilize a native OpenTelemetry receiver for collecting Kubernetes
events.
Therefore clusterReceiver.k8sEventsEnabled
option is now deprecated and replaced by the following two options:
clusterReceiver.eventsEnabled
: to send Kubernetes events in the new OTel format to Splunk Observability Log Observer (if splunkObservability.logsEnabled=true) or to Splunk Platform (if splunkPlatform.logsEnabled=true).splunkObservability.infrastructureMonitoringEventsEnabled
: to collect Kubernetes events using the Signalfx Kubernetes events receiver and send them to Splunk Observability Infrastructure Monitoring.
If you have clusterReceiver.k8sEventsEnabled
set to true
to send Kubernetes events to both Splunk Observability
Infrastructure Monitoring and Splunk Observability Log Observer, remove clusterReceiver.k8sEventsEnabled
from your
custom values.yaml enable both clusterReceiver.eventsEnabled
and
splunkObservability.infrastructureMonitoringEventsEnabled
options. This will send the Kubernetes events to Splunk
Observability Log Observer in the new OpenTelemetry format.
If you want to keep sending Kubernetes events to Splunk Observability Log Observer in the old Signalfx format to keep
exactly the same behavior as before, remove clusterReceiver.k8sEventsEnabled
from your custom values.yaml and add the
following configuration:
splunkObservability:
logsEnabled: true
infrastructureMonitoringEventsEnabled: true
clusterReceiver:
config:
exporters:
splunk_hec/events:
endpoint: https://ingest.<SPLUNK_OBSERVABILITY_REALM>.signalfx.com/v1/log
log_data_enabled: true
profiling_data_enabled: false
source: kubelet
sourcetype: kube:events
token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
service:
pipelines:
logs/events:
exporters:
- signalfx
- splunk_hec/events
where SPLUNK_OBSERVABILITY_REALM
must be replaced by splunkObservability.realm
value.
New releases of opentelemetry-log-collection ( v0.29.0, v0.28.0 ) have breaking changes
Several of the logging receivers supported by the Splunk Otel Collector Chart were updated to use v0.29.0 instead v0.27.2 of opentelemetry-log-collection.
- Check to see if you have any custom log monitoring setup with the extraFileLogs config, the logsCollection.containers.extraOperators config, or any of the affected receivers. If you don't have any custom log monitoring setup, you can stop here.
- Read the documentation for upgrading to opentelemetry-log-collection v0.29.0.
- If opentelemetry-log-collection v0.29.0 or v0.28.0 will break any of your custom log monitoring, update your log monitoring to accommodate the breaking changes.
If you haven't already completed the steps in upgrade guidelines 0.47.0 to 0.47.1 , then complete them.
[receiver/k8sclusterreceiver] Fix k8s node and container cpu metrics not being reported properly
The Splunk Otel Collector added a feature gate to enable a bug fix for three metrics. These metrics have a current and a legacy name, we list both as pairs (current, legacy) below.
- Affected Metrics
k8s.container.cpu_request
,kubernetes.container_cpu_request
k8s.container.cpu_limit
,kubernetes.container_cpu_limit
k8s.node.allocatable_cpu
,kubernetes.node_allocatable_cpu
- Upgrade Steps
- Check to see if any of your custom monitoring uses the affected metrics. Check for the current and legacy names of the affected metrics. If you don't use the affected metrics in your custom monitoring, you can stop here.
- Read the documentation for the receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate and the bug fix it applies.
- If the bug fix will break any of your custom monitoring for the affected metrics, update your monitoring to accommodate the bug fix.
- Feature Gate Stages and Versions
- Alpha (versions 0.47.1-0.48.0):
- The feature gate is disabled by default. Use the
--set clusterReceiver.featureGates=receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
argument with the helm install/upgrade command, or add the following line to your custom values.yaml to enable the feature gate:
clusterReceiver: featureGates: receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
- The feature gate is disabled by default. Use the
- Beta (versions 0.49.0-0.54.0):
- The feature gate is enabled by default. Use the
--set clusterReceiver.featureGates=-receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
argument with the helm install/upgrade command, or add the following line to your custom values.yaml to disable the feature gate:
clusterReceiver: featureGates: -receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
- The feature gate is enabled by default. Use the
- Generally Available (versions +0.55.0):
- The receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate functionality is permanently enabled and the feature gate is no longer available for anyone.
- Alpha (versions 0.47.1-0.48.0):
[receiver/k8sclusterreceiver] Use newer batch and autoscaling APIs
Kubernetes clusters with version 1.20 stopped having active support on 2021-12-28 and had an end of life date on 2022-02-28. The k8s_cluster receiver was refactored to use newer Kubernetes APIs that are available starting in Kubernetes version 1.21. The latest version of the k8s_cluster receiver will no longer be able to collect all the previously available metrics with Kubernetes clusters that have versions below 1.21.
If version 0.45.0 of the chart cannot collect metrics from your Kubernetes cluster that is a version below 1.21, you will see error messages in your cluster receiver logs that look like this.
Failed to watch *v1.CronJob: failed to list *v1.CronJob: the server could not find the requested resource
To better support users, in a future release we are adding a feature that will allow users to use the last version of the k8s_cluster receiver that supported Kubernetes clusters below version 1.21.
If you still want to keep the previous behavior of the k8s_cluster receiver and upgrade to v0.45.0 of the chart, make sure your Kubernetes cluster uses one of the following versions.
kubernetes
,aks
,eks
,eks/fargate
,gke
,gke/autopilot
- Use version 1.21 or above
openshift
- Use version 4.8 or above
#375 Resource detection processor is configured to override all host and cloud attributes
If you still want to keep the previous behavior, use the following custom values.yaml configuration:
agent:
config:
processors:
resourcedetection:
override: false
#357 Double expansion issue in splunk-otel-collector is fixed
If you use OTel native logs collection with any custom log processing operators
in filelog
receiver, please replace any occurrences of $$$$
with $$
.
#325 Logs collection is now disabled by default for Splunk Observability destination
If you send logs to Splunk Observability destination, make sure to enable logs.
Use --set="splunkObservability.logsEnabled=true"
argument with helm
install/upgrade command, or add the following line to your custom values.yaml:
splunkObservability:
logsEnabled: true
#297, #301 Several parameters in values.yaml configuration were renamed according to Splunk GDI Specification
If you use the following parameters in your custom values.yaml, please rename them accordingly:
provider
->cloudProvider
distro
->distribution
otelAgent
->agent
otelCollector
->gateway
otelK8sClusterReceiver
->clusterReceiver
#306 Some parameters under splunkPlatform
group were
renamed
If you use the following parameters under splunkPlatform
group, please make
sure they are updated:
metrics_index
->metricsIndex
max_connections
->maxConnections
disable_compression
->disableCompression
insecure_skip_verify
->insecureSkipVerify
#295 Secret names are changed according to the GDI specification
If you provide access token for Splunk Observability using a custom Kubernetes
secret (secter.create=false), please update the secret key from
splunk_o11y_access_token
to splunk_observability_access_token
#273 Changed configuration to fetch attributes from labels and annotations of pods and namespaces
podLabels
parameter under the extraAttributes
group is now deprecated.
in favor of fromLabels
. Please update your custom values.yaml accordingly.
For example, the following config:
extraAttributes:
podLabels:
- app
- git_sha
Should be changed to:
extraAttributes:
fromLabels:
- key: app
- key: git_sha
#316 Busybox dependency is removed, splunk/fluentd-hec image is used in init container instead
image.fluentd.initContainer
is not being used anymore. Please remove it from
your custom values.yaml.
If you have any extra receivers that require access to node's files or directories that are not mounted by default, you need to setup additional volume mounts.
For example, if you have the following smartagent/docker-container-stats
receiver added to your configuration:
agent:
config:
receivers:
smartagent/docker-container-stats:
type: docker-container-stats
dockerURL: unix:///hostfs/var/run/docker.sock
You need to mount the docker socket to your container as follows:
extraVolumeMounts:
- mountPath: /hostfs/var/run/docker.sock
name: host-var-run-docker
readOnly: true
extraVolumes:
- name: host-var-run-docker
hostPath:
path: /var/run/docker.sock
#246 Simplify configuration for switching to native OTel logs collection
The config to enable native OTel logs collection was changed from
fluentd:
enabled: false
logsCollection:
enabled: true
to
logsEngine: otel
Enabling both engines is not supported anymore. If you need that, you can install fluentd separately.
The following parameters are now deprecated and moved under
splunkObservability
group. They need to be updated in your custom values.yaml
files before backward compatibility is discontinued.
Required parameters:
splunkRealm
changed tosplunkObservability.realm
splunkAccessToken
changed tosplunkObservability.accessToken
Optional parameters:
ingestUrl
changed tosplunkObservability.ingestUrl
apiUrl
changed tosplunkObservability.apiUrl
metricsEnabled
changed tosplunkObservability.metricsEnabled
tracesEnabled
changed tosplunkObservability.tracesEnabled
logsEnabled
changed tosplunkObservability.logsEnabled
#163 Auto-detection of prometheus metrics is disabled by default: If you rely on automatic prometheus endpoints detection to scrape prometheus metrics from pods in your k8s cluster, make sure to add this configuration to your values.yaml:
autodetect:
prometheus: true