-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add self-diagnostics example #1846
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1846 +/- ##
=======================================
- Coverage 74.6% 74.6% -0.1%
=======================================
Files 122 122
Lines 19952 19952
=======================================
- Hits 14902 14901 -1
- Misses 5050 5051 +1 ☔ View full report in Codecov by Sentry. |
let provider: LoggerProvider = LoggerProvider::builder() | ||
.with_simple_exporter(exporter) | ||
.build(); | ||
let filter = EnvFilter::new("error"); // only push errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a readme.md and explain how this works and how infinite logging is prevented here?
(I am not sure if we prevent that actually though... For example, if OTLP Endpoint is not reachable, and we use this approach - would we be in infinite loop ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if OTLP Endpoint is not reachable, and we use this approach - would we be in an infinite loop
It shouldn't be an infinite loop with OTLP Metrics Exporter + stdout LogExporter. Should we update the example to use the OTLP exporters for both logs and metrics, as that could need some filters to avoid an infinite loop if the collector is not running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That'd be the most common scenario (using OTLP), so showing how exactly to filter would be helpful for majority users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential risk of an infinite loop when using the OTLP Metrics Exporter and the OTLP Log Exporter as a custom handler. If the OTLP Log Exporter generates an error, it can trigger this loop. We can manage this in our code; currently, it is being handled by the custom handler.
fn custom_error_handler(err: OtelError) { | ||
match err { | ||
OtelError::Metric(err) => error!("OpenTelemetry metrics error occurred: {}", err), | ||
OtelError::Trace(err) => error!("OpenTelemetry trace error occurred: {}", err), | ||
OtelError::Log(err) => error!("OpenTelemetry log error occurred: {}", err), | ||
OtelError::Propagation(err) => error!("OpenTelemetry propagation error occurred: {}", err), | ||
OtelError::Other(err_msg) => error!("OpenTelemetry error occurred: {}", err_msg), | ||
_ => error!("An unknown OpenTelemetry error occurred"), //won't reach here | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much the same as the default handler but using traing::error!
macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes true. This is to demonstrate using OpenTelemetry logging pipeline as custom error handler for self-diagnostics.
Is there an easy way to catch from the error handler when spans are being dropped? Ideally, I'd like to use this technique to build a status provider for the export pipeline. |
Moving to draft to address some concerns. |
Can you elaborate more? Or maybe create a new issue ? This PR is just showing how to override error handler to route logs via |
// Metrics are exported by default every 30 seconds when using stdout exporter, | ||
// however shutting down the MeterProvider here instantly flushes | ||
// the metrics, instead of waiting for the 30 sec interval. | ||
meter_provider.shutdown()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use force_flush
instead of shutdown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was delibrate to call shutdown twice, and trigger "already shutdown" error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collector logs added in the README file do not show anything for "already shutdown" error though. There is only any entry for exceeding cardinality limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant the extra shutdown was to demonstrate that the error is triggered but not logged, as the pipeline is already closed with the first shutdown. But it seems it was causing more confusion, so have removed the second shutdown :)
Co-authored-by: Utkarsh Umesan Pillai <[email protected]>
Changes
Example to demonstrate using
tracing
as a global error handler for errors generated by the OpenTelemetry Metrics SDK. In this example, measurements are recorded to exceed the cardinality limit, which triggers the error to be logged. This error is then emitted to stdout usingopentelemetry-appender-tracing
subscriber.Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes