Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add self-diagnostics example #1846

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented May 29, 2024

Changes

Example to demonstrate using tracing as a global error handler for errors generated by the OpenTelemetry Metrics SDK. In this example, measurements are recorded to exceed the cardinality limit, which triggers the error to be logged. This error is then emitted to stdout using opentelemetry-appender-tracing subscriber.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner May 29, 2024 23:22
@lalitb lalitb marked this pull request as draft May 29, 2024 23:22
@lalitb lalitb changed the title Add self-diagnostics example [WIP] Add self-diagnostics example May 29, 2024
Copy link

codecov bot commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.6%. Comparing base (e0fb7fe) to head (3631c8f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1846     +/-   ##
=======================================
- Coverage   74.6%   74.6%   -0.1%     
=======================================
  Files        122     122             
  Lines      19952   19952             
=======================================
- Hits       14902   14901      -1     
- Misses      5050    5051      +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lalitb lalitb marked this pull request as ready for review May 30, 2024 02:31
@lalitb lalitb changed the title [WIP] Add self-diagnostics example [Add self-diagnostics example May 30, 2024
@lalitb lalitb changed the title [Add self-diagnostics example Add self-diagnostics example May 30, 2024
let provider: LoggerProvider = LoggerProvider::builder()
.with_simple_exporter(exporter)
.build();
let filter = EnvFilter::new("error"); // only push errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a readme.md and explain how this works and how infinite logging is prevented here?
(I am not sure if we prevent that actually though... For example, if OTLP Endpoint is not reachable, and we use this approach - would we be in infinite loop ?)

Copy link
Member Author

@lalitb lalitb May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if OTLP Endpoint is not reachable, and we use this approach - would we be in an infinite loop

It shouldn't be an infinite loop with OTLP Metrics Exporter + stdout LogExporter. Should we update the example to use the OTLP exporters for both logs and metrics, as that could need some filters to avoid an infinite loop if the collector is not running?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That'd be the most common scenario (using OTLP), so showing how exactly to filter would be helpful for majority users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential risk of an infinite loop when using the OTLP Metrics Exporter and the OTLP Log Exporter as a custom handler. If the OTLP Log Exporter generates an error, it can trigger this loop. We can manage this in our code; currently, it is being handled by the custom handler.

Comment on lines 13 to 22
fn custom_error_handler(err: OtelError) {
match err {
OtelError::Metric(err) => error!("OpenTelemetry metrics error occurred: {}", err),
OtelError::Trace(err) => error!("OpenTelemetry trace error occurred: {}", err),
OtelError::Log(err) => error!("OpenTelemetry log error occurred: {}", err),
OtelError::Propagation(err) => error!("OpenTelemetry propagation error occurred: {}", err),
OtelError::Other(err_msg) => error!("OpenTelemetry error occurred: {}", err_msg),
_ => error!("An unknown OpenTelemetry error occurred"), //won't reach here
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much the same as the default handler but using traing::error! macros?

Copy link
Member Author

@lalitb lalitb May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes true. This is to demonstrate using OpenTelemetry logging pipeline as custom error handler for self-diagnostics.

@ramgdev
Copy link
Contributor

ramgdev commented May 30, 2024

Is there an easy way to catch from the error handler when spans are being dropped? Ideally, I'd like to use this technique to build a status provider for the export pipeline.

@lalitb
Copy link
Member Author

lalitb commented May 31, 2024

Moving to draft to address some concerns.

@lalitb lalitb marked this pull request as draft May 31, 2024 16:21
@cijothomas
Copy link
Member

Is there an easy way to catch from the error handler when spans are being dropped? Ideally, I'd like to use this technique to build a status provider for the export pipeline.

Can you elaborate more? Or maybe create a new issue ? This PR is just showing how to override error handler to route logs via tracing.

@lalitb lalitb marked this pull request as ready for review June 3, 2024 21:08
// Metrics are exported by default every 30 seconds when using stdout exporter,
// however shutting down the MeterProvider here instantly flushes
// the metrics, instead of waiting for the 30 sec interval.
meter_provider.shutdown()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use force_flush instead of shutdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was delibrate to call shutdown twice, and trigger "already shutdown" error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector logs added in the README file do not show anything for "already shutdown" error though. There is only any entry for exceeding cardinality limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant the extra shutdown was to demonstrate that the error is triggered but not logged, as the pipeline is already closed with the first shutdown. But it seems it was causing more confusion, so have removed the second shutdown :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants