-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability for SpanProcessor to mutate spans on end #4024
base: main
Are you sure you want to change the base?
Add ability for SpanProcessor to mutate spans on end #4024
Conversation
What is the benefit of To my understanding both changes are breaking for |
One benefit is that while order of span processor registration matters for
We're working out the details of that in #4030, but I disagree that adding a new method to an SDK plugin interface is a breaking change. Languages vary in terms of how they expose these plugin interfaces and the language features for evolving interfaces, but there it should be possible for all SDKs to accommodate this type of change, albeit with some creativity in some cases. For example, its simple in java to add a new default method to an interface which existing implementers don't need to implement. This doesn't exist in some language like go, but there you can offer a new |
@jmacd In my opinion, this would remove the benefits this PR is intending to add: Making it easier for users to enrich spans in span processors before they are exported. There is already a workaround for doing this with the existing So the problem I see with making changes in My take here is that there are two viable classes of solutions for allowing easy enrichment of spans via the SDK:
This PR specs out solution category 1. We can definitely revisit this decision! However, if we do go for a pipelining approach, I'd propose to rather enhance the However, if we go for the pipeline approach, I'd propose to separate the construction of processors from the assembly of the pipeline by either:
The reason why I'm suggesting is that if we leave the pipeline assembly to the user via decoration/wrapping, this makes the pipeline structure a blackbox for the SDK. If instead the pipeline assembly is managed by the SDK, it has more control over it: e.g. it could insert logging or telemetry between the pipeline stages if necessary. It also plays much better with autoconfiguration.
I'm not sure whether |
Co-authored-by: jack-berg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JonasKunz. I understand the motivation and agree with this change. Also, you've helped me understand how the change I'm looking for can be made orthogonally.
I added suggestions because I think OnEnd()
becomes less-well specified without adding more text, given the new and similarly-named OnEnding()
. Presently, nothing is said about the execution model for OnEnd()
--for example whether or not the first call to OnEnd()
is required to return before the second processor's OnEnd()
is called.
Reading into the OnEnd()
text below your changes here, I find that lines 613/614 (in your change, lines 592/593 prior) are problematic, given the question posed in #4010. The OnEnd()
callback (which I'd like to be named OnExport()
) should not receive a mutable Span, so "modifying is not allowed" is redundant. For us to add pipeline capabilities, the OnEnd()
callback should be allowed to change the data, which is distinct from modifying it -- only its own exporter would see the changes.
Co-authored-by: Joshua MacDonald <[email protected]>
@jmacd so if my understanding is correct, you are planning to add some kind of pipelining / chaining capabilities to I wonder if the addition of such pipelines would render the The main differences in terms of capabilities between
So what do you think? Should we move ahead with this PR and get it merged or should we wait and check the overlap with the pipeline approach you are coming up with? |
@JonasKunz I assumed that you mean for
The requirements, as written, I think ensure that callers are not permitted to use the span reference after
I'd probably tack on "SHOULD be reflected in it until End() is called on the Span". For the last two bullets, I don't see a problem. I expect the OnEnd() callbacks all to execute before the export begins (a.k.a. OnEnd()). Nothing is stated about when the OnEnd happens, but after your change it should be clear that pipelining effects (whatever they are) begin with the OnEnd call. I think OnEnding() makes sense the way you have it -- and there are real use cases so we don't need to block for future design work. |
Discussed this in the context of #4062 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pellared Would you say that if #4030 is accepted, this PR should be approached differently? (To me, it seems like the answer is yes.) I've speculated about what the solution might look like in #4062 (review), which is briefly for SDKs to support a "FanoutProcessor" that gives users control over whether mutations are private to their export pipeline and/or visible to the next processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions as #4061 is merged.
Co-authored-by: Robert Pająk <[email protected]>
The SDK MUST guarantee that the span can no longer be modified by any other thread | ||
before invoking `OnEnding` of the first `SpanProcessor`. From that point on, modifications | ||
are only allowed synchronously from within the invoked `OnEnding` callbacks. All registered SpanProcessor `OnEnding` callbacks are executed before any SpanProcessor's `OnEnd` callback is invoked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how and SDK can provide such guarantee. It looks like more like a hint for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical pattern to implement this would be to simply hold the lock for modifications of the span while executing the OnEnding
callbacks. After the those are finished, the span is set to an immutable state. So if anyone concurrently tries to modify the span, they will wait on the lock and afterwards fail the modification due to the span now being immutable.
If you want concurrent modification attempts to fail eagerly, you can instead mark the span as immutable before executing the OnEnding
s. While OnEnding
callbacks are executed a second lock is held and mutation are still allowed when a thread is holding that second lock.
I'd expect all languages dealing with concurrency to also have mutexes/locks, so I don't see why any SDK shouldn't be able to provide this guarantee? Everything else is likely to make the OnEnding
callbacks prone to race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will wait on the lock and afterwards fail the modification due to the span now being immutable
As you mentioned, I think that most SDKs are only able to make such guarantee at runtime. We should have a way to give feedback.
What should be the output when a failed modification occur?
- no-op
- Log that the modification is ignored and that is an illegal operation. Tt would be good to recommend a log message. (my preference)
- Exception (this would be rather against the OTel spec policy to not throw exception at runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception (this would be rather against the OTel spec policy to not throw exception at runtime)
I think this would also violate the API-spec, because API methods would suddenly start throwing exceptions.
For the other two options: Do you think that this is required to be part of the spec and should not just be left to the SDK implementors to decide whats best/idiomatic? I don't really see the benefit in specifying this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that this is required to be part of the spec and should not just be left to the SDK implementors to decide whats best/idiomatic? I don't really see the benefit in specifying this
It makes implementors think on how to do notify about giving feedback to the caller rather than just doing no-op.
Example from the same document:
There SHOULD be a message printed in the SDK's log to indicate to the user that an attribute, event, or link was discarded due to such a limit. To prevent excessive logging, the message MUST be printed at most once per span (i.e., not per discarded attribute, event, or link).
However, this portion of information should be probably out of scope of this PR as this about the behavior of the SDK span itself.
I also want to indicate that there are operations' definitions of the Span interface. Maybe we should update the definition of https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#updatename from
Updates the Span name.
To e.g.
A
Span
MUST have the ability to update its name.
All of this should be probably handled in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only way for the span to escape to be modified concurrently while any SpanProcessor.OnStart is running, would be to let it escape from a SpanProcessor.OnStart call.
Yes. If it is possible and if we are concerned about it then we should add the same restrictions in OnStart
. Changing a span asynchronously is also rather something very uncommon.
I am also concerned that such restrictions on SDK would produce additional computational overhead.
At this point, I would rather add a general guideline that the processors and applications should not modify the spans asynchronously to avoid race conditions and indeterministic behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If it is possible and if we are concerned about it then we should add the same restrictions in OnStart.
I personally think it an edge case here not important enough. What we could do is state that for OnStart
processors must not concurrently modify the spans. So impose a restriction on implementors, which however is not enforced through any code in the SDK. More like don't do this, or bad things may happen.
At this point, I would rather add a general guideline that the processors should not modify the spans asynchronously to avoid race conditions and indeterministic behavior.
For OnStart
, I agree to go this route (like stated in my paragraph above) if we deem it important enough. For OnEnding
, this doesn't solve the problem of concurrent modifications performed by application code and not SpanProcessors (which is impossible for OnStart
).
Moreover, I am concerned that such restrictions on SDK would produce additional computational overhead.
I'm trying to understand what computational overhead you are thinking of: Isn't locking already involved anyway to make spans concurrency safe, so this shouldn't impose additional cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing a span asynchronously is also rather something very uncommon.
I personally don't agree with that statement. Getting the current span (from the context) and adding an attribute to it or changing the name doesn't seem too uncommon to me. And the current span might have been started and ended on another thread. I think this shouldn't be to uncommen in go too, because you could easily pass the current context/span to a goroutine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent modifications performed by application code
Is not an edge case as well? I never saw any application or instrumentation library doing asynchronous span modification that could run concurrently while ending a span.
Isn't locking already involved anyway to make spans concurrency safe, so this shouldn't impose additional cost?
It would be a more complex locking logic. I guess the overhead would be minimal, but never sure until you measure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent modifications performed by application code
Is not an edge case as well? I never saw any application or instrumentation library doing asynchronous span modification that could run concurrently while ending a span.
Looks like we commented at the same time, I think I answered that in my previous comment
Getting the current span (from the context) and adding an attribute to it or changing the name doesn't seem too uncommon to me. And the current span might have been started and ended on another thread. I think this shouldn't be to uncommen in go too, because you could easily pass the current context/span to a goroutine, right?
At least in Java it is very common that the current context (and therefore current span) is passed around to different threads / asynchronous tasks.
Fixes #1089.
In addition to the comments on the issue, this was discussed in the spec SIG Meeting on 2024/23/04:
SpanProcessor
s due to better conceptual fitSpanProcessor
s in a chaining fashion during the initial SDK spec design and it was actively decided against it. However, no one could recall the reason why.Based on this input, I decided to not move the chaining-based solution forward and stay with the original proposal of adding a new callback to be invoked just before a span is ended.
I also decided to name the new callback
OnEnding
instead ofBeforeEnd
as suggested in this comment. The nameOnEnding
is more precise about when the callback is invoked.A big discussion point in the SIG Meeting on 2024/23/04 also was the problem of evolving SDK plugin interfaces without breaking backwards compatibility. This PR contains a proposal to clarify how this should be handled: If the language allows it, interfaces should be directly extended. If not possible, implementations will need to introduce new interfaces and accept them in addition to the existing ones for backwards compatibility. I feel like this allow every language to implement changes to interfaces in the best way possible. Of course, changes to interfaces should still be kept to a necessary minimum.
I also wasn't sure whether this change warrants an addition to the spec-compliance-matrix, so I've left it out for now.
Please leave a comment if you think this is required, then I'll add it.
Changes
Adds a new
OnEnding
callback toSpanProcessor
Add a paragraph on clarifying how languages should deal with interface extensions
Related issues Add BeforeEnd to have a callback where the span is still writeable #1089
Related OTEP(s) #Links to the prototypes (when adding or changing features)
beforeEnd
in that PoC)CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessary