Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel container startup with deferred values #315

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

xandris
Copy link

@xandris xandris commented Jan 27, 2022

Hi dig maintainers,

My organization made an internal tool that has many different startup tasks that all produce a unique Go type. Before dig/fx, some tasks wrote to a shared struct, some tasks read from it, and some did both. This meant having to carefully order functions, keeping in mind these implicit dependencies hidden in func bodies. I'm busy removing this shared struct and rewriting each task as a dig/fx provider and letting dig order things automatically. It's wonderful! Can't thank you enough.

However—and sorry for frontloading this so much—our existing code, poor quality as it is, does perform many slow tasks in parallel. Forcing these to happen in serial would push our execution time over our execution frequency. But dependency injection with dig is a perfect match with our codebase, so I would really like to work something out.

I don't believe the core of dig is incompatible with parallel execution. To prove this to myself, as a first pass I made paramList use an Errgroup to call all its children and wait for them to finish, and I made the store maps sync.Map. Where two goroutines reached the same constructorNode, I put a mutex to protect the called field. It worked but felt too simplistic and hard to control.

The approach I settled on lets all dig code run in one goroutine, and user-provided constructors can either run in the same goroutine (default) or in pooled/ephemeral goroutines. This keeps mutexes out of dig's data structures and lets users decide how much concurrency they want.

I tried this several ways, but this pull request is the cleanest way I found. It creates a new deferred type params and constructorNodes return. A deferred can be observed and resolved like a very simple Promise. Since params and constructors pass values indirectly via a Scope, deferred doesn't need much state.

I would very much like to get concurrent initialization into dig by any means; it doesn't have to be this pull request. I'm willing to any changes you deem necessary; let's work together!

Refs GO-1191

@CLAassistant
Copy link

CLAassistant commented Jan 27, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ xandris
✅ EstebanOlmedo
✅ abhinav
✅ sywhang
❌ Alexandra Parker


Alexandra Parker seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@xandris xandris force-pushed the parallel-startup-deferred branch 2 times, most recently from cce342d to 9ff2fb4 Compare January 27, 2022 01:30
This commit tries to be as unobtrusive as possible, attaching new behavior to existing types where possible rather than building out new infrastructure.

constructorNode returns a deferred value when called. On the first call, it asks paramList to start building an arg slice, which may also be deferred.

Once the arg slice is resolved, constructorNode schedules its constructor function to be called. Once it's called, it resolves its own deferral.

Multiple paramSingles can observe the same constructorNode before it's ready. If there's an error, they may all see the same error, which is a change in behavior.

There are two schedulers: synchronous and parallel. The synchronous scheduler returns things in the same order as before. The parallel may not (and the tests that rely on shuffle order will fail). The scheduler needs to be flushed after deferred values are created. The synchronous scheduler does nothing on when flushing, but the parallel scheduler runs a pool of goroutines to resolve constructors.

Calls to dig functions always happen on the same goroutine as Scope.Invoke(). Calls to constructor functions can happen on pooled goroutines.

The choice of scheduler is up to the Scope. Whether constructor functions are safe to call in parallel seems most logically to be a property of the scope, and the scope is passed down the constructor/param call chain.
Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xandris,

Thanks for the contribution and the explanation! We would like to help find a good solution to your problem. We really appreciate the thoughtfulness you put into the PR.

  • If dig was to support concurrent initialization, I really like the idea of keeping dig's internal state management in its own goroutine. Especially because we've put zero effort into making dig's state management safe for concurrent use.
  • I like how clean the deferred implementation is.

In the past, we've been reluctant to add concurrency to Dig largely because we felt that it would add significant complexity and risk to the code. Our position was that since Dig relies heavily on reflection, there's no expectation of high performance from it, and anything that suggests otherwise should be avoided.

However, we discussed the problem you're experiencing and this PR, and I think we're happy for Dig to support concurrent initialization. This reduces the scope of the problem (and our concerns) to a much safer subset, and introduces minimal new API surface area.

That said, the timing of this change was a bit unfortunate. As you might have noticed, we're deep in the middle of refactoring and cleanup to support decorations and replacements in the container. Apologies for the conflicts, but if you don't mind resolving them, we can work with you to get this merged.

(I have included some preliminary review comments; I haven't had a chance to do a full review yet. I suspect we'll want to move deferred and scheduler to internal/ to unit test them independently.)

CC @sywhang @shirchen

container.go Outdated Show resolved Hide resolved
dig_test.go Outdated Show resolved Hide resolved
dig_test.go Outdated Show resolved Hide resolved
@xandris
Copy link
Author

xandris commented Jan 27, 2022

That said, the timing of this change was a bit unfortunate. As you might have noticed, we're deep in the middle of refactoring and cleanup to support decorations and replacements in the container. Apologies for the conflicts, but if you don't mind resolving them, we can work with you to get this merged.

Roger! I've started working on the merge conflicts and I'll do a force push once those are resolved. I have some training to do today, but after that I'll start running review comments to ground. Very happy we can work on this together!

One comment I have for the decorators impl–not that anyone asked–is that there seems to be a lot of code duplication between constructors and decorators. There are limits to DRY just by Go's nature but I wonder if there's opportunities here...

@abhinav
Copy link
Collaborator

abhinav commented Jan 27, 2022

One comment I have for the decorators impl–not that anyone asked–is that there seems to be a lot of code duplication between constructors and decorators. There are limits to DRY just by Go's nature but I wonder if there's opportunities here...

Feedback is absolutely welcome! Yes, there is room for DRYing up there. The codebase has some amount of historical debt which we're aware of, so we've been trying to find a good balance between "refactor it all until it's nice and clean" and actually getting fx.Replace working. Right now we're leaning towards deferring some of the non-essential clean-ups until afterwards.

@sywhang
Copy link
Contributor

sywhang commented Jan 27, 2022

There are limits to DRY just by Go's nature but I wonder if there's opportunities here...

Definitely! Thanks for pointing that out. As @abhinav said, we're kind of pushing hard to get the fx.Replace feature out ASAP so there are definitely some debts being left around. We'll be revisiting those after we make fx-side changes.

This commit tries to be as unobtrusive as possible, attaching new behavior to existing types where possible rather than building out new infrastructure.

constructorNode returns a deferred value when called. On the first call, it asks paramList to start building an arg slice, which may also be deferred.

Once the arg slice is resolved, constructorNode schedules its constructor function to be called. Once it's called, it resolves its own deferral.

Multiple paramSingles can observe the same constructorNode before it's ready. If there's an error, they may all see the same error, which is a change in behavior.

There are two schedulers: synchronous and parallel. The synchronous scheduler returns things in the same order as before. The parallel may not (and the tests that rely on shuffle order will fail). The scheduler needs to be flushed after deferred values are created. The synchronous scheduler does nothing on when flushing, but the parallel scheduler runs a pool of goroutines to resolve constructors.

Calls to dig functions always happen on the same goroutine as Scope.Invoke(). Calls to constructor functions can happen on pooled goroutines.

The choice of scheduler is up to the Scope. Whether constructor functions are safe to call in parallel seems most logically to be a property of the scope, and the scope is passed down the constructor/param call chain.
@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #315 (58a2c64) into master (50965fd) will decrease coverage by 0.26%.
The diff coverage is 95.41%.

@@            Coverage Diff             @@
##           master     #315      +/-   ##
==========================================
- Coverage   98.34%   98.08%   -0.27%     
==========================================
  Files          21       25       +4     
  Lines        1450     1565     +115     
==========================================
+ Hits         1426     1535     +109     
- Misses         15       20       +5     
- Partials        9       10       +1     
Impacted Files Coverage Δ
provide.go 100.00% <ø> (ø)
container.go 95.23% <75.00%> (-4.77%) ⬇️
internal/promise/deferred.go 93.02% <93.02%> (ø)
param.go 97.49% <93.67%> (+0.11%) ⬆️
decorate.go 99.00% <95.83%> (-1.00%) ⬇️
constructor.go 97.61% <100.00%> (+0.64%) ⬆️
internal/scheduler/parallel.go 100.00% <100.00%> (ø)
internal/scheduler/scheduler.go 100.00% <100.00%> (ø)
internal/scheduler/unbounded.go 100.00% <100.00%> (ø)
invoke.go 100.00% <100.00%> (ø)
... and 11 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

xandris and others added 6 commits January 27, 2022 17:36
Move deferred to an internal/promise package.
Rename deferred to Deferred, export methods,
and rename alreadyResolved and failedDeferred to Done and Fail.
Move scheduler.go into internal/scheduler,
rename and export things,
and split the implementations across files.
Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the updates @xandris.

We want Dig to have this feature, but we're slightly reluctant to merge the promise-based approach right now; we want to make sure that this is the only clean way to do this (since there is otherwise a maintainability risk associated with it). Going over the code a couple times, my intuition agrees with you that this is the most straightforward way to keep Dig's state management separate from the initialization goroutines. However, we'd like to give this a deeper look than we've been able to.

Unfortunately—and this ties into the other changes you've been seeing in Dig and Fx—we're currently heads-down with getting fx.Replace ready, so can't investigate too much right now.

Further, the new fx.Replace feature comes with a fair amount of risk because of the amount of internal refactoring it required. Adding another thing that significantly alters the core of how Dig operates would further compound the risk of this release.

Would you be willing to pin to your branch of Dig while we make more time for this on our end?

if err := shallowCheckDependencies(c, n.paramList); err != nil {
return errMissingDependencies{
n.deferred.Resolve(errMissingDependencies{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to return here and in other error cases?
Or rather, not set calling and deferred until after the check,
with the check itself returning promise.Fail.

@xandris
Copy link
Author

xandris commented Feb 1, 2022

Would you be willing to pin to your branch of Dig while we make more time for this on our end?

The risk assessment makes sense to me. Standing by; we can use module replacements until this is ready. What do you mean by "pin"?

@abhinav
Copy link
Collaborator

abhinav commented Feb 1, 2022

Excellent! Again, thanks for the PR.

What do you mean by "pin"?

Oops, I meant module replacements. "Pin to X" (as in, "pin to branch X" or "pin to version "Y") is what we've been using to generically refer to module replacements. I think maybe the phrase was more accurate before Go modules (with dep and glide) when we had to sometimes pin a package to a specific version to ensure it did not get upgraded.

@sywhang
Copy link
Contributor

sywhang commented May 28, 2022

Hi @xandris , apologies for the delayed review. We had a few other priorities to take care of, and I just found time to get back to review this PR. I'll spend some time this week to test this internally as well in our codebase.

But before I start doing that, I see that this branch has gone out of sync with master by quite a bit, so let me first address the merge conflicts first. Thank you for the patience!

sywhang and others added 9 commits May 28, 2022 16:56
The scope scheduler wasn't being copied upon a new scope being
created. This caused multiple schedulers to exist in the app
each with different settings, causing some funky behavior.

Align the scheduler behavior by copying parents scheduler over
to the child scopes as new scopes are created.
While building the fields of a paramObject struct, there could be the
case where two fields use the same decorator, but one of them isn't
decorated. This fixes that behavior.

This also fixes a bug that created a data race when there was a
decorated value.
This fixes a bug that ocurred while calling paramSingle.build that could
lead in panicking because of invalid memory address or nil pointer
dereference introduced in: 8f06d1d
f.Build has a struct receiver, not a pointer, so that method call copies f anyway.
@xandris
Copy link
Author

xandris commented Jan 9, 2024

It's been a minute. How are things?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants