-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Add storage translation for newly created buckets #3671
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romilbhardwaj for catching this! I found that it might make more sense to do the translation locally, instead of skipping the check on controller. See comments below. Wdyt?
sky/data/storage.py
Outdated
@@ -325,6 +326,13 @@ def __deepcopy__(self, memo): | |||
|
|||
def _validate_existing_bucket(self): | |||
"""Validates the storage fields for existing buckets.""" | |||
if controller_utils.is_running_on_jobs_controller(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of skipping the check on the controller, we may want to translate those storage entries into external ones in controller_utils.maybe_translate_local_file_mounts_and_sync_up
before submitting to the controller.
Reason: if the storage entry does not contain store
field, it might be created on one cloud provider locally, but the controller may create another one on a different cloud provider:
file_mounts:
/output_path:
name: my-bucket
mode: MOUNT
…o controller_new_storage_fix
Thanks @Michaelvll - updated the implementation to translate storage entries to externally sourced storages. Updated PR title and description too. Running tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @romilbhardwaj! The PR looks mostly good to me. : )
'persistent': storage_obj.persistent, | ||
'mode': storage_lib.StorageMode.MOUNT.value, | ||
}) | ||
updated_mount_storages[storage_path] = new_storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set force_delete
to True
, so that when persistent=False
, the jobs controller can delete the storage, even if it is considered as external storage on the controller?
task.update_storage_mounts(updated_mount_storages) | ||
|
||
|
||
def is_running_on_jobs_controller() -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this movement? It seems the function is not used at places other than dashboard.py?
Closes #3666.
We skip the storage name validation because it is already done when the job is submitted from the client and the guardrails are not required for the controller.We now translate storage objects which create new buckets to become externally sourced buckets.
E.g.,
is translated to:
Also updated smoke tests to test for writing to an object store.
Tested (run the relevant ones):
pytest -v tests/test_smoke.py::test_managed_jobs_storage --kubernetes
pytest -v tests/test_smoke.py::test_managed_jobs_storage --aws
pytest -v tests/test_smoke.py::test_managed_jobs_storage --gcp
pytest tests/test_smoke.py::TestStorageWithCredentials