Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appender-tracing: Include trace ID and span ID in logs when nested in tracing crate's spans #2438

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

dylanahsmith
Copy link

Fixes #1378

Problem Summary

opentelemetry-appender-tracing was only including the trace & span IDs from scopes opened using the opentelemetry tracing API. However, given that opentelemetry-appender-tracing is meant to provide support for getting these events from the tracing crate, it seems more likely that the spans would also be created through the tracing crate, but the trace & span IDS from those spans weren't being included in the logs.

Changes

Similar to the previous fix attempt (#1394), the span/trace IDs are obtained from the OtelData span extension from that the opentelemetry-tracing adds.

Issues resolved from the previous fix attempt (#1394):

Additionally, I've made the following changes:

  • Added a regression test that ensures the trace & span IDs are consistent between the logs and tracing signals
  • I've made the new tracing feature a default feature, since it seems very likely that there will be tracing spans to correlate if opentelemetry-appender-tracing is being. This should help avoid a lot of confusion, but still allows users to opt-out of this feature.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@dylanahsmith dylanahsmith requested a review from a team as a code owner December 17, 2024 06:33
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to discuss #2438 (comment)

@@ -17,6 +17,7 @@ tracing = { workspace = true, features = ["std"]}
tracing-core = { workspace = true }
tracing-log = { version = "0.2", optional = true }
tracing-subscriber = { workspace = true, features = ["registry", "std"] }
tracing-opentelemetry = { version = "0.28", optional = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a conflict if the application uses version 0.29 of tracing-opentelemetry, while the appender depends on version 0.28 as specified in this configuration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing-opentelemetry v0.29 hasn't been released, but I think I see what you are getting at.

tracing-opentelemetry v0.28 (latest release) depends on opentelemetry v0.27, so won't be compatible with opentelemetry v0.28 when it is released. It looks like typically all the opentelemetry crates would be bumped together so there is a version of all these crates that are compatible with each other. However, this means the point at which opentelemetry-appender-tracing v0.28 would be released there wouldn't be a compatible version of tracing-opentelemetry without coordination with that external crate.

Given that this is an experimental feature, I'm guessing it doesn't make sense to wait for tracing-opentelemetry v0.29 to be released before releasing opentelemetry-appender-tracing.

Removing the version constraint would probably work in practice with these unstable versions given that the constraint on opentelemetry crates should ensure the correct version is used. Although, potentially tracing-opentelemetry could introduce a breaking change that is incompatible with opentelemetry-appender-tracing ithout changing the opentelemetry v0.x version that it depends on, which I think would result in the incompatible version of tracing-opentelemetry to be chosen to use by cargo when a compatible version does exist.

I think the simplest thing to do for now (while things are unstable) when releasing a new v0.x version of opentelemetry crates would be to just optimistically bump the version of this optional tracing-opentelemetry dependency to its next v0.y version. E.g. when releasing opentelemetry v0.28.0, depend on tracing-opentelemetry v0.29, so that it can be made compatible with opentelemetry-appender-tracing without any further changes to opentelemetry-appender-tracing. If tracing-opentelemetry v0.29.0 isn't compatible, then v0.28.1 can be released of opentelemetry-appender-tracing could be released fixing this feature. However, it also won't result in an incompatible v0.30 version of tracing-opentelemetry being used when v0.29 is already working with this feature.

Functionally, you could think of a release of a new v0.x version of opentelemetry crates (including opentelemetry-appender-tracing) as removing this feature, then automatically reintroducing the new feature when the next v0.y version of tracing-opentelemetry is made. This could be done manually, which would allow for extra testing after the v0.y release of tracing-opentelemetry is made, but it just seems like it introduces unnecessary work and complicates the opentelemetry-appender-tracing release process.

Once things stabilize, this could be coordinated more carefully between opentelemetry-appender-tracing & tracing-opentelemetry using RC releases, such that the new major version of opentelemetry-appender-tracing is made compatible with a RC version of tracing-opentelemetry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dylanahsmith. I agree that optimistically bumping the version of tracing-opentelemetry to the next v0.y (e.g., v0.29 when releasing opentelemetry v0.28) seems like a practical approach for now, given the experimental nature of the feature. If compatibility issues arise, a quick patch release can address them.

@lfdominguez
Copy link

lfdominguez commented Dec 28, 2024

Please merge this, working perfect for me:

Using this dependencies on my actix-web app:

# Tracing
tracing = "0.1"
tracing-log = "0.2"
opentelemetry = "0.27"
opentelemetry_sdk = { version = "0.27", features = ["rt-tokio-current-thread"] }
opentelemetry-otlp = "0.27"
opentelemetry-semantic-conventions = "0.27"
opentelemetry-appender-tracing = { version = "0.27", features = ["experimental_use_tracing_span_context"] }
tracing-opentelemetry = "0.28"
tracing-subscriber = { version = "0.3", features = ["registry", "env-filter"] }
tracing-bunyan-formatter = "0.3"
tracing-actix-web = { version = "0.7", features = ["opentelemetry_0_27"] }

[patch.crates-io]
opentelemetry = { package = "opentelemetry",git = "https://github.com/dylanahsmith/opentelemetry-rust.git", branch = "appender-tracing-fix-trace-context"  }
opentelemetry-appender-tracing = { package = "opentelemetry-appender-tracing", git = "https://github.com/dylanahsmith/opentelemetry-rust.git", branch = "appender-tracing-fix-trace-context" }
opentelemetry-http = { package = "opentelemetry-http", git = "https://github.com/dylanahsmith/opentelemetry-rust.git", branch = "appender-tracing-fix-trace-context"  }
opentelemetry-otlp = { package = "opentelemetry-otlp", git = "https://github.com/dylanahsmith/opentelemetry-rust.git", branch = "appender-tracing-fix-trace-context"  }
opentelemetry_sdk = { package = "opentelemetry_sdk", git = "https://github.com/dylanahsmith/opentelemetry-rust.git", branch = "appender-tracing-fix-trace-context"  }

And setup:

use std::sync::LazyLock;
use opentelemetry::KeyValue;
use opentelemetry::trace::{TraceError, TracerProvider};
use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
use opentelemetry_otlp::{LogExporter, MetricExporter, SpanExporter, WithExportConfig};
use opentelemetry_sdk::{logs::{self as sdklogs}, runtime, trace::{self as sdktrace}, Resource};
use opentelemetry_sdk::logs::LoggerProvider;
use opentelemetry_sdk::metrics::{MetricError, SdkMeterProvider};
use opentelemetry_sdk::propagation::TraceContextPropagator;
use opentelemetry_semantic_conventions::resource;
use tracing_bunyan_formatter::{JsonStorageLayer};
use tracing_subscriber::{EnvFilter, Layer};
use tracing_subscriber::layer::SubscriberExt;
use crate::error::CustomError;
use crate::tracing_utils;

const APP_NAME: &str = crate::built_info::PKG_NAME;

static RESOURCE: LazyLock<Resource> =
    // LazyLock::new(|| Resource::new(vec![KeyValue::new(resource::SERVICE_NAME, APP_NAME)]));
    LazyLock::new(|| {
        Resource::builder()
            .with_service_name(APP_NAME)
            .build()
    });

pub fn init_telemetry() -> Result<(), CustomError> {
    opentelemetry::global::set_text_map_propagator(TraceContextPropagator::new());

    let log_tracer = tracing_utils::init_logs()?;
    let span_tracer = tracing_utils::init_traces()?;
    let metric_tracer = tracing_utils::init_metrics()?;

    opentelemetry::global::set_meter_provider(metric_tracer.clone());

    let filter_otel = EnvFilter::new("info")
        .add_directive("hyper=off".parse().unwrap())
        .add_directive("opentelemetry=off".parse().unwrap())
        .add_directive("opentelemetry_sdk=off".parse().unwrap())
        .add_directive("tonic=off".parse().unwrap())
        .add_directive("h2=off".parse().unwrap())
        .add_directive("reqwest=off".parse().unwrap());


    let otel_log_layer = OpenTelemetryTracingBridge::new(&log_tracer).with_filter(filter_otel);
    let otel_span_layer = tracing_opentelemetry::layer().with_tracer(span_tracer.tracer(APP_NAME));

    let env_filter = EnvFilter::try_from_default_env().unwrap_or(EnvFilter::new("info")
        .add_directive("opentelemetry=debug".parse().unwrap())
        .add_directive("opentelemetry_sdk=off".parse().unwrap())
    );

    let format = tracing_subscriber::fmt::format().pretty();
    let formatting_layer = tracing_subscriber::fmt::layer().event_format(format);


    let subscriber = tracing_subscriber::Registry::default()
        .with(env_filter)
        .with(otel_span_layer)
        .with(otel_log_layer)
        .with(JsonStorageLayer)
        .with(formatting_layer);

    tracing::subscriber::set_global_default(subscriber).map_err(|err| {
        CustomError::TracingGenericError(format!("Fail to setup global tracing config: {}", err.to_string()))
    })?;

    Ok(())
}

fn init_logs() -> Result<sdklogs::LoggerProvider, opentelemetry_sdk::logs::LogError> {
    let exporter = LogExporter::builder()
        .with_tonic()
        .with_endpoint("http://127.0.0.1:44317")
        .build()?;

    Ok(LoggerProvider::builder()
        // .with_batch_exporter(exporter, runtime::TokioCurrentThread)
        .with_batch_exporter(exporter)
        .with_resource(RESOURCE.clone())
        .build())
}

fn init_traces() -> Result<sdktrace::TracerProvider, TraceError> {
    let exporter = SpanExporter::builder()
        .with_tonic()
        .with_endpoint("http://127.0.0.1:44317")
        .build()?;

    Ok(sdktrace::TracerProvider::builder()
        .with_batch_exporter(exporter, runtime::TokioCurrentThread)
        .with_resource(RESOURCE.clone())
        .build())
}

fn init_metrics() -> Result<opentelemetry_sdk::metrics::SdkMeterProvider, MetricError> {
    let exporter = MetricExporter::builder()
        .with_tonic()
        .with_endpoint("http://127.0.0.1:44317")
        .build()?;

    // let reader = opentelemetry_sdk::metrics::PeriodicReader::builder(exporter, runtime::TokioCurrentThread)
    let reader = opentelemetry_sdk::metrics::PeriodicReader::builder(exporter)
        .with_interval(std::time::Duration::from_secs(1))
        .build();

    Ok(SdkMeterProvider::builder()
        .with_reader(reader)
        .with_resource(RESOURCE.clone())
        .build())
}

and set this on actix-web httpserver app setup:

.wrap(TracingLogger::default())

image

@VincentBaijot
Copy link

This correction is very nice.
I use it in a project and its awsome to be able to use instrument macro and see the trace and logs automatically correlated even in open telemetries.

Im waiting for it to be merged.

@cijothomas
Copy link
Member

Tagging for consideration in 0.28 though not strictly required for 0.28.

@cijothomas cijothomas added this to the 0.28.0 milestone Jan 21, 2025
@lalitb
Copy link
Member

lalitb commented Jan 21, 2025

@maximebedard - Would it be possible for you to help with the review, as changes are similar to #1394.
@dylanahsmith - Need to fix the CI build as discussed here.

@cijothomas cijothomas dismissed their stale review January 22, 2025 01:32

Feature flag is added now, so removing the block.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.5%. Comparing base (6a8db83) to head (50b549a).

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #2438     +/-   ##
=======================================
+ Coverage   77.8%   79.5%   +1.6%     
=======================================
  Files        122     118      -4     
  Lines      23030   22495    -535     
=======================================
- Hits       17931   17884     -47     
+ Misses      5099    4611    -488     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cijothomas cijothomas added the integration tests Run integration tests label Jan 23, 2025
@cijothomas
Copy link
Member

@TommyCpp Could you help review this?

@cijothomas cijothomas requested a review from bantonsson January 23, 2025 01:04
Cargo.toml Show resolved Hide resolved
Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks safe being behind feature flag and validated by multiple users.
One thing we need to take care is to bump the tracing-opentelemetry version as discussed here as part of every release. But good to have more eyes/approvals.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

.get::<OtelData>()
.and_then(|otd| otd.builder.span_id);

let opt_trace_id = span.scope().last().and_then(|root_span| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to look for the root span to get the trace_id here? Shouldn't the current span have the trace_id set as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the current span won't have the span id explicitly set on it, so it won't actually store the trace id itself, instead it will be inherited when the span is built for sending.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose going straight to the root span wouldn't support an explicitly set trace id on a non-root span. Is there a non-contrived example of when this might be done that I should add a test for and support? Or should this just be left as a limitation to avoid extra code and runtime overhead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the listener code and yeah, it's only root spans that get the trace_id set in the builder. Someone would have to manually override that and set a trace_id on the builder to get that behavior.

Still, I think that we should pull out the trace_id from the builder and use that if it's available, or fall back to the root one.

@caibirdme
Copy link

can't wait for this pr to be merged, it's really helpful

@TommyCpp
Copy link
Contributor

Will take a look tomorrow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration tests Run integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: opentelemetry-appender-tracing should respect information inside tracing::Span
8 participants