Add ability for SpanProcessor to mutate spans on end #4024

JonasKunz · 2024-04-30T08:53:50Z

In addition to the comments on the issue, this was discussed in the spec SIG Meeting on 2024/23/04:

The filtering use-case explained in this comment should rather be solved by the upcoming samplerV2 instead of SpanProcessors due to better conceptual fit
The buffering use-case also explained in this comment seems to be not relevant enough to influence the design decision
Apparently there was a discussion around building the SpanProcessors in a chaining fashion during the initial SDK spec design and it was actively decided against it. However, no one could recall the reason why.

Based on this input, I decided to not move the chaining-based solution forward and stay with the original proposal of adding a new callback to be invoked just before a span is ended.

I also decided to name the new callback OnEnding instead of BeforeEnd as suggested in this comment. The name OnEnding is more precise about when the callback is invoked.

A big discussion point in the SIG Meeting on 2024/23/04 also was the problem of evolving SDK plugin interfaces without breaking backwards compatibility. This PR contains a proposal to clarify how this should be handled: If the language allows it, interfaces should be directly extended. If not possible, implementations will need to introduce new interfaces and accept them in addition to the existing ones for backwards compatibility. I feel like this allow every language to implement changes to interfaces in the best way possible. Of course, changes to interfaces should still be kept to a necessary minimum.

I also wasn't sure whether this change warrants an addition to the spec-compliance-matrix, so I've left it out for now.
Please leave a comment if you think this is required, then I'll add it.

Changes

Adds a new OnEnding callback to SpanProcessor
Add a paragraph on clarifying how languages should deal with interface extensions
Related issues Add BeforeEnd to have a callback where the span is still writeable #1089
~~Related OTEP(s) #~~
Links to the prototypes (when adding or changing features)
- Java (Callback is still called beforeEnd in that PoC)
CHANGELOG.md file updated for non-trivial changes
spec-compliance-matrix.md updated if necessary

specification/versioning-and-stability.md

CHANGELOG.md

specification/trace/sdk.md

specification/versioning-and-stability.md

specification/trace/sdk.md

Co-authored-by: jack-berg <[email protected]>

…ions

specification/trace/sdk.md

…ions

Flarna · 2024-05-08T14:45:43Z

What is the benefit of onEnding() compared to pass the read/write span to onEnd()?
Both are sync, both are called at the same time (within the Span.End() api).

To my understanding both changes are breaking for SpanProcessor implementers. Either they have to add a method or change the type.

jack-berg · 2024-05-08T14:53:59Z

What is the benefit of onEnding() compared to pass the read/write span to onEnd()?

One benefit is that while order of span processor registration matters for onEnding(), it doesn't today and wouldn't with this change with onEnd(). Implementers of onEnd() can continue to rely on semantics where the they get an immutable span which cannot be impacted by other span processors.

To my understanding both changes are breaking for SpanProcessor implementers. Either they have to add a method or change the type.

We're working out the details of that in #4030, but I disagree that adding a new method to an SDK plugin interface is a breaking change. Languages vary in terms of how they expose these plugin interfaces and the language features for evolving interfaces, but there it should be possible for all SDKs to accommodate this type of change, albeit with some creativity in some cases. For example, its simple in java to add a new default method to an interface which existing implementers don't need to implement. This doesn't exist in some language like go, but there you can offer a new ExtendedSpanProcessor interface which includes the new method, and provide a new mechanism for registering that new span processor interface with TracerProvider.

…ions

JonasKunz · 2024-05-22T11:12:11Z

@JonasKunz I am curious to know whether you think it is essential for the on-end handler being discussed in this PR to make changes that are visible to other processors. In my opinion, it would be better if the changes made on-end applied to only the exporter associated with the processor making the change.

@jmacd In my opinion, this would remove the benefits this PR is intending to add: Making it easier for users to enrich spans in span processors before they are exported.

There is already a workaround for doing this with the existing onEnd() callback: Users can decorate/wrap the SimpleSpanProcessor / BatchSpanProcessor and wrap the incoming ReadableSpans in order to pass the mutated ones to the span processor invoking the exporter.
We actually implemented this exact mechanism to provide a span processor which collects stacktraces as attributes for long running spans here. The problem with this approach that it requires a big amount of boilerplate code and doesn't play well with SDK autoconfiguration. This PR was an example use-case for why we wanted to drive #1089 forward for providing a simpler, SDK-native approach for enriching spans on end.

So the problem I see with making changes in onEnding not visible to other users is that it again pushes the responsibility to setup bootstrap pipelines to the user.

My take here is that there are two viable classes of solutions for allowing easy enrichment of spans via the SDK:

Adding a callback which mutates the actual span (like onStart)
Adding the capability to build pipelines which pass around immutable (but wrappable) span data where the terminal stage of the pipeline is some consumer (e.g. a span exporter)

This PR specs out solution category 1.
In this comment I discussed both solution classes and also provided backwards-compatible PoC implementations for Java for both.
However, the arguments for 2.) seemed to not be convincing enough (this was also discussed in a spec-SIG Meeting), I went for 1. in this PR.

We can definitely revisit this decision! However, if we do go for a pipelining approach, I'd propose to rather enhance the onEnd callback for it instead of introducing a new onEnding callback.

However, if we go for the pipeline approach, I'd propose to separate the construction of processors from the assembly of the pipeline by either:

Passing the next pipeline stage to call on invocation as a Function to onEnd
Introducing a separate init callback which receives the next pipeline stage as a function

The reason why I'm suggesting is that if we leave the pipeline assembly to the user via decoration/wrapping, this makes the pipeline structure a blackbox for the SDK.

If instead the pipeline assembly is managed by the SDK, it has more control over it: e.g. it could insert logging or telemetry between the pipeline stages if necessary. It also plays much better with autoconfiguration.

However, I'm wondering if this PR is the right place to have this discussion, as it seems to affect SpanProcessors in a broader way, given that, should we decide in #4010 to switch to processors with independent pipelines, wouldn't it mean that (for SpanProcessors) we should address OnStart() as well? At least if we wanted to keep consistency in the overall SDK behavior anyway.

I'm not sure whether onStart is a good candidate for pipelining. Pipelines make sense if you have some consumer at the end. While onStart might have such consumers (e.g. creating a metric about started spans), it usually is used to mutate spans before they are visible to the user/application code. I don't see how the second aspect fits nicely into a pipeline model vs just invoking onStart on all SpanProcessors in a loop.

specification/trace/sdk.md

Co-authored-by: jack-berg <[email protected]>

jmacd

Thank you @JonasKunz. I understand the motivation and agree with this change. Also, you've helped me understand how the change I'm looking for can be made orthogonally.

I added suggestions because I think OnEnd() becomes less-well specified without adding more text, given the new and similarly-named OnEnding(). Presently, nothing is said about the execution model for OnEnd()--for example whether or not the first call to OnEnd() is required to return before the second processor's OnEnd() is called.

Reading into the OnEnd() text below your changes here, I find that lines 613/614 (in your change, lines 592/593 prior) are problematic, given the question posed in #4010. The OnEnd() callback (which I'd like to be named OnExport()) should not receive a mutable Span, so "modifying is not allowed" is redundant. For us to add pipeline capabilities, the OnEnd() callback should be allowed to change the data, which is distinct from modifying it -- only its own exporter would see the changes.

specification/trace/sdk.md

Co-authored-by: Joshua MacDonald <[email protected]>

JonasKunz · 2024-05-24T11:30:40Z

Thank you @JonasKunz. I understand the motivation and agree with this change. Also, you've helped me understand how the change I'm looking for can be made orthogonally

@jmacd so if my understanding is correct, you are planning to add some kind of pipelining / chaining capabilities to OnEnd (or renamed OnExport), right?

I wonder if the addition of such pipelines would render the OnEnding callback proposed here useless and whether we should flesh out the pipelining first before moving ahead with this PR.

The main differences in terms of capabilities between OnEnding and the pipelining I can think of are the following:

OnEnding mutates the original span, these changes are therefore visible to anyone holding a reference to that span. I don't know if this has any effect though: The user application might be holding onto API-Spans, but those are write-only anyway to my understanding. So there wouldn't be any way for the application to actually observe those changes made in OnEnding
OnEnding is guaranteed to be executed on the application thread ending the span. We might not guarantee this for the pipeline stages
OnEnding is guaranteed to be invoked for every span (unless there is a bug in the instrumentation with dangling spans of course), this might not be the case for pipeline stages if we allow filtering there

So what do you think? Should we move ahead with this PR and get it merged or should we wait and check the overlap with the pipeline approach you are coming up with?

jmacd · 2024-05-24T15:23:02Z

@JonasKunz I assumed that you mean for OnEnding() to receive a "read/write span object" the same as is passed to OnStart(). I think it means that processors can read (the data object) and write (the span object).

OnEnding mutates the original span, these changes are therefore visible to anyone holding a reference to that span.

The requirements, as written, I think ensure that callers are not permitted to use the span reference after End() is called, though the processors are still permitted to via direct access from OnEnding(). There's a bit of conflict with:

It SHOULD be possible to keep a reference to this span object and updates to the span
SHOULD be reflected in it.

I'd probably tack on "SHOULD be reflected in it until End() is called on the Span".

For the last two bullets, I don't see a problem. I expect the OnEnd() callbacks all to execute before the export begins (a.k.a. OnEnd()). Nothing is stated about when the OnEnd happens, but after your change it should be clear that pipelining effects (whatever they are) begin with the OnEnd call. I think OnEnding() makes sense the way you have it -- and there are real use cases so we don't need to block for future design work.

…ions

jmacd · 2024-05-28T15:16:40Z

Discussed this in the context of #4062

pellared

I see this as blocked by #4030.

jmacd · 2024-05-29T22:11:23Z

@pellared Would you say that if #4030 is accepted, this PR should be approached differently? (To me, it seems like the answer is yes.) I've speculated about what the solution might look like in #4062 (review), which is briefly for SDKs to support a "FanoutProcessor" that gives users control over whether mutations are private to their export pipeline and/or visible to the next processor.

specification/trace/sdk.md

pellared · 2024-05-31T16:28:08Z

@pellared Would you say that if #4030 is accepted, this PR should be approached differently?

I think this PR is fine and it is a logical extension. We should only highlight that processor does not need to implement the newly added method for backwards compatibility.

…ions

pellared

Suggestions as #4061 is merged.

specification/trace/sdk.md

CHANGELOG.md

Co-authored-by: Robert Pająk <[email protected]>

specification/trace/sdk.md

pellared · 2024-06-27T07:02:30Z

specification/trace/sdk.md

+The SDK MUST guarantee that the span can no longer be modified by any other thread
+before invoking `OnEnding` of the first `SpanProcessor`. From that point on, modifications
+are only allowed synchronously from within the invoked `OnEnding` callbacks.  All registered SpanProcessor `OnEnding` callbacks are executed before any SpanProcessor's `OnEnd` callback is invoked.


I am not sure how and SDK can provide such guarantee. It looks like more like a hint for the user.

The typical pattern to implement this would be to simply hold the lock for modifications of the span while executing the OnEnding callbacks. After the those are finished, the span is set to an immutable state. So if anyone concurrently tries to modify the span, they will wait on the lock and afterwards fail the modification due to the span now being immutable.

If you want concurrent modification attempts to fail eagerly, you can instead mark the span as immutable before executing the OnEndings. While OnEnding callbacks are executed a second lock is held and mutation are still allowed when a thread is holding that second lock.

I'd expect all languages dealing with concurrency to also have mutexes/locks, so I don't see why any SDK shouldn't be able to provide this guarantee? Everything else is likely to make the OnEnding callbacks prone to race conditions.

they will wait on the lock and afterwards fail the modification due to the span now being immutable

As you mentioned, I think that most SDKs are only able to make such guarantee at runtime. We should have a way to give feedback.

What should be the output when a failed modification occur?

no-op

Log that the modification is ignored and that is an illegal operation. Tt would be good to recommend a log message. (my preference)

Exception (this would be rather against the OTel spec policy to not throw exception at runtime)

Exception (this would be rather against the OTel spec policy to not throw exception at runtime)

I think this would also violate the API-spec, because API methods would suddenly start throwing exceptions.

For the other two options: Do you think that this is required to be part of the spec and should not just be left to the SDK implementors to decide whats best/idiomatic? I don't really see the benefit in specifying this

Do you think that this is required to be part of the spec and should not just be left to the SDK implementors to decide whats best/idiomatic? I don't really see the benefit in specifying this

It makes implementors think on how to do notify about giving feedback to the caller rather than just doing no-op.

Example from the same document:

There SHOULD be a message printed in the SDK's log to indicate to the user that an attribute, event, or link was discarded due to such a limit. To prevent excessive logging, the message MUST be printed at most once per span (i.e., not per discarded attribute, event, or link).

However, this portion of information should be probably out of scope of this PR as this about the behavior of the SDK span itself.

I also want to indicate that there are operations' definitions of the Span interface. Maybe we should update the definition of https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#updatename from

Updates the Span name.

To e.g.

A Span MUST have the ability to update its name.

All of this should be probably handled in a separate PR.

So the only way for the span to escape to be modified concurrently while any SpanProcessor.OnStart is running, would be to let it escape from a SpanProcessor.OnStart call.

Yes. If it is possible and if we are concerned about it then we should add the same restrictions in OnStart. Changing a span asynchronously is also rather something very uncommon.

I am also concerned that such restrictions on SDK would produce additional computational overhead.

At this point, I would rather add a general guideline that the processors and applications should not modify the spans asynchronously to avoid race conditions and indeterministic behavior.

Yes. If it is possible and if we are concerned about it then we should add the same restrictions in OnStart.

I personally think it an edge case here not important enough. What we could do is state that for OnStart processors must not concurrently modify the spans. So impose a restriction on implementors, which however is not enforced through any code in the SDK. More like don't do this, or bad things may happen.

At this point, I would rather add a general guideline that the processors should not modify the spans asynchronously to avoid race conditions and indeterministic behavior.

For OnStart, I agree to go this route (like stated in my paragraph above) if we deem it important enough. For OnEnding, this doesn't solve the problem of concurrent modifications performed by application code and not SpanProcessors (which is impossible for OnStart).

Moreover, I am concerned that such restrictions on SDK would produce additional computational overhead.

I'm trying to understand what computational overhead you are thinking of: Isn't locking already involved anyway to make spans concurrency safe, so this shouldn't impose additional cost?

Changing a span asynchronously is also rather something very uncommon.

I personally don't agree with that statement. Getting the current span (from the context) and adding an attribute to it or changing the name doesn't seem too uncommon to me. And the current span might have been started and ended on another thread. I think this shouldn't be to uncommen in go too, because you could easily pass the current context/span to a goroutine, right?

concurrent modifications performed by application code

Is not an edge case as well? I never saw any application or instrumentation library doing asynchronous span modification that could run concurrently while ending a span.

Isn't locking already involved anyway to make spans concurrency safe, so this shouldn't impose additional cost?

It would be a more complex locking logic. I guess the overhead would be minimal, but never sure until you measure it.

concurrent modifications performed by application code

Is not an edge case as well? I never saw any application or instrumentation library doing asynchronous span modification that could run concurrently while ending a span.

Looks like we commented at the same time, I think I answered that in my previous comment

Getting the current span (from the context) and adding an attribute to it or changing the name doesn't seem too uncommon to me. And the current span might have been started and ended on another thread. I think this shouldn't be to uncommen in go too, because you could easily pass the current context/span to a goroutine, right?

At least in Java it is very common that the current context (and therefore current span) is passed around to different threads / asynchronous tasks.

…ions

JonasKunz added 6 commits April 30, 2024 10:28

Clarified approach on how to extend SDK plugin interfaces

1722053

Added onEnding method to SpanProcessor

a4b411a

wording

600c721

Added changelog

1d3543d

linting

d60594f

more linting

6bf8266

dmathieu reviewed Apr 30, 2024

View reviewed changes

specification/versioning-and-stability.md Outdated Show resolved Hide resolved

dmathieu reviewed Apr 30, 2024

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

JonasKunz marked this pull request as ready for review April 30, 2024 09:18

JonasKunz requested review from a team as code owners April 30, 2024 09:18

github-actions bot assigned yurishkuro Apr 30, 2024

jack-berg reviewed Apr 30, 2024

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

specification/versioning-and-stability.md Outdated Show resolved Hide resolved

jack-berg reviewed Apr 30, 2024

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

JonasKunz mentioned this pull request May 2, 2024

Clarify possibility to extend SDK interfaces #4030

Merged

5 tasks

JonasKunz and others added 3 commits May 2, 2024 12:13

Add experimental status

19d5a77

Co-authored-by: jack-berg <[email protected]>

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

5bcce33

…ions

Removed paragraph about SDK interface extensions

598a9eb

pyohannes reviewed May 2, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

JonasKunz added 2 commits May 8, 2024 13:32

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

e34d828

…ions

Remove accidental merge leftover

81bef44

pellared mentioned this pull request May 8, 2024

sdk/log: Define in DESIGN.md how Processor and Exporter interfaces can be extended open-telemetry/opentelemetry-go#5318

Closed

JonasKunz added 4 commits May 14, 2024 12:26

Disallow concurrent updates, specify invocation order

b56fbb9

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

f54293d

…ions

Linting

990c91a

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

e38a16d

…ions

jmacd self-assigned this May 20, 2024

jack-berg reviewed May 22, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

jack-berg approved these changes May 22, 2024

View reviewed changes

Update specification/trace/sdk.md

2cf024a

Co-authored-by: jack-berg <[email protected]>

jmacd requested changes May 23, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

specification/trace/sdk.md Outdated Show resolved Hide resolved

Apply suggestions from code review

ae572ca

Co-authored-by: Joshua MacDonald <[email protected]>

jmacd approved these changes May 24, 2024

View reviewed changes

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

303c8c2

…ions

pellared mentioned this pull request May 28, 2024

Add isolating log record processor #4062

Merged

pellared requested changes May 28, 2024

View reviewed changes

pellared reviewed May 31, 2024

View reviewed changes

specification/trace/sdk.md Show resolved Hide resolved

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

4d8422a

…ions

pellared reviewed Jun 5, 2024

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

JonasKunz and others added 7 commits June 5, 2024 16:17

Apply suggestions from code review

218783f

Co-authored-by: Robert Pająk <[email protected]>

Merge branch 'main' into spanprocessor-end-mutations

1a385b5

Merge branch 'main' into spanprocessor-end-mutations

b0f776c

Merge branch 'main' into spanprocessor-end-mutations

d680966

Fix changelog after merge

1e5c7fa

Marked SpanProcessor.OnEnding as optional

ffcb614

Add to compliance matrix

7cdc995

JonasKunz requested a review from pellared June 26, 2024 12:33

lint fix

a72ea8f

pellared reviewed Jun 27, 2024

View reviewed changes

JonasKunz added 2 commits June 27, 2024 09:56

Added missing methods, adjust formatting

fe34f57

Merge remote-tracking branch 'otel/main' into spanprocessor-end-mutat…

a8da098

…ions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability for SpanProcessor to mutate spans on end #4024

Add ability for SpanProcessor to mutate spans on end #4024

JonasKunz commented Apr 30, 2024 •

edited

Loading

Flarna commented May 8, 2024

jack-berg commented May 8, 2024

JonasKunz commented May 22, 2024 •

edited

Loading

jmacd left a comment •

edited

Loading

JonasKunz commented May 24, 2024

jmacd commented May 24, 2024

jmacd commented May 28, 2024

pellared left a comment •

edited

Loading

jmacd commented May 29, 2024

pellared commented May 31, 2024

pellared left a comment •

edited

Loading

pellared Jun 27, 2024

JonasKunz Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024

JonasKunz Jun 27, 2024

pellared Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024 •

edited

Loading

JonasKunz Jun 27, 2024

JonasKunz Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024 •

edited

Loading

JonasKunz Jun 27, 2024

Add ability for SpanProcessor to mutate spans on end #4024

Are you sure you want to change the base?

Add ability for SpanProcessor to mutate spans on end #4024

Conversation

JonasKunz commented Apr 30, 2024 • edited Loading

Changes

Flarna commented May 8, 2024

jack-berg commented May 8, 2024

JonasKunz commented May 22, 2024 • edited Loading

jmacd left a comment • edited Loading

Choose a reason for hiding this comment

JonasKunz commented May 24, 2024

jmacd commented May 24, 2024

jmacd commented May 28, 2024

pellared left a comment • edited Loading

Choose a reason for hiding this comment

jmacd commented May 29, 2024

pellared commented May 31, 2024

pellared left a comment • edited Loading

Choose a reason for hiding this comment

pellared Jun 27, 2024

Choose a reason for hiding this comment

JonasKunz Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

pellared Jun 27, 2024

Choose a reason for hiding this comment

JonasKunz Jun 27, 2024

Choose a reason for hiding this comment

pellared Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

pellared Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

JonasKunz Jun 27, 2024

Choose a reason for hiding this comment

JonasKunz Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

pellared Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

JonasKunz Jun 27, 2024

Choose a reason for hiding this comment

JonasKunz commented Apr 30, 2024 •

edited

Loading

JonasKunz commented May 22, 2024 •

edited

Loading

jmacd left a comment •

edited

Loading

pellared left a comment •

edited

Loading

pellared left a comment •

edited

Loading

JonasKunz Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024 •

edited

Loading

JonasKunz Jun 27, 2024 •

edited

Loading

pellared Jun 27, 2024 •

edited

Loading