-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Cross-Version Collection Migration #2400
base: anton/distributed_param_management
Are you sure you want to change the base?
[ENH] Cross-Version Collection Migration #2400
Conversation
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
9da85c6
to
876de4c
Compare
4e51a90
to
fb78fc6
Compare
50ecef5
to
c8f99a4
Compare
fb78fc6
to
72962a4
Compare
c8f99a4
to
3781435
Compare
72962a4
to
d121f6c
Compare
3781435
to
9e82022
Compare
d121f6c
to
4e0f924
Compare
9e82022
to
646b55d
Compare
4e0f924
to
d45d0a0
Compare
646b55d
to
7efdcc3
Compare
4f97ea8
to
e17f7ed
Compare
7efdcc3
to
e4b8410
Compare
e17f7ed
to
071ab50
Compare
e4b8410
to
c224667
Compare
071ab50
to
625451c
Compare
c224667
to
5516b7e
Compare
625451c
to
8917034
Compare
5516b7e
to
c1fb6d6
Compare
8917034
to
782f125
Compare
c1fb6d6
to
b96e4aa
Compare
782f125
to
03a9139
Compare
b96e4aa
to
5d4ca28
Compare
03a9139
to
47f0966
Compare
5d4ca28
to
3d27445
Compare
47f0966
to
5ee7c97
Compare
@@ -97,5 +100,23 @@ def docs() -> None: | |||
webbrowser.open("https://docs.trychroma.com") | |||
|
|||
|
|||
@app.command() # type: ignore | |||
def migrate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its a bit odd that you have to specify the tenant and database, and would prefer if we added some tooling to migrate all of them. This feels like pushing our constraints (no list tenants) onto our users and delivering a worse UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought here was that if you are using the tenant and database args, you know what you're doing and what you want to migrate. I could instead crawl the entire sysDB too. Let's discuss.
(the error here is I forget to pass these args, fixed.)
from tqdm import tqdm | ||
|
||
|
||
def migrate_collections( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to version these? how does this extend in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss. My thought here was we would wait for the next one to happen and then write the right abstraction, but you make the point elsewhere that we do already have other migrations. Will take a look at those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove and deprecate https://github.com/chroma-core/chroma-migrate since we are adding this
class EmptyConfiguration(Configuration): | ||
definitions = {} | ||
|
||
def patched_from_json_str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may be able to run this migration as a sql migration? That would be preferable I think. I'm not really a fan of this UX and introducing a new migration hook that isn't future proof in a nice way - forcing people to run a migration tool for a change like this feels heavy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing this in SQL is scary because it would require direct string manipulation without validation.
@@ -312,6 +313,7 @@ def test_cycle_versions( | |||
system = config.System(settings) | |||
system.start() | |||
client = ClientCreator.from_system(system) | |||
migrate_collections(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only run this if the old version is one we expect to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is that any migration we run here must be idempotent in the case there's nothing to be done, and doing this test this way helps ensure that.
3d27445
to
c55656f
Compare
5ee7c97
to
8509e67
Compare
c55656f
to
ec7dbb6
Compare
8509e67
to
37f7fc2
Compare
ec7dbb6
to
fafcc3e
Compare
37f7fc2
to
094b6ea
Compare
fafcc3e
to
14643db
Compare
094b6ea
to
7e4c972
Compare
Description of changes
This PR creates a path to migrating from previous versions of Chroma to the new version where we have collection configuration storage. The migration is idempotent and non-destructive.
Since all collections now must have a configuration, old collections would error when loading them - this was reflected in cross-version persistence failures.
With this approach, that doesn't happen. This is a first step to providing user-facing migration tooling. For now it's just this one script, but later as we add more of these, they can be composed in a more intelligent way.
This PR includes a new CLI application as part of the
chroma
CLI,chroma migrate
which will migrate all collections in a specifiedpath
(and optional tenant, and database), with ./chroma being the default.Test plan
Manual Test:
list_collections()
should fail with a JSON parsing error (since configurations don't exist)list_collections()
should work as expected.Automated:
test_cross_version_persist
passes locally and in CI.ALL TESTS Should pass by this point in the stack.
Documentation Changes
The migration and migration tool is documented at https://docs.trychroma.com/deployment/migration
Additionally, when a collection tries and fails to load a CollectionConfiguration from JSON, the error points the user to the same migration documentation.
TODO: