-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: on demand docker environments #1796
base: master
Are you sure you want to change the base?
feature: on demand docker environments #1796
Conversation
…anges to conda_environment
|
…e images inside flows.
get_pinned_conda_libs, | ||
) | ||
|
||
BAKERY_METAFILE = ".imagebakery-cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the imagebakery-cache
can also be made an optimization detail within docker_environment.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can you help me understand the contents of this cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved caching to the environment side. what is in the cache at the moment is
- a unique hash derived from the requested environment (python, packages, image-type etc.)
- the image tag that was returned from the bakery endpoint for the request
- we also persist the request payload that was sent to the bakery in order to be able to list details of cached images via
docker cache list
import hashlib | ||
import json | ||
import requests | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something of this sorts work here -
import json | |
import requests | |
class FastBakery(object): | |
def __init__(self, url): | |
self.url = url | |
def bake( | |
self, | |
python_ver=None, # TODO - Ensure the keys are same in the API spec and client | |
pypi_packages=None, | |
conda_packages=None, | |
base_image=None, | |
image_kind="esgz-zstd", | |
): | |
deep_clean = (lambda f: lambda d: f(f, d))( | |
lambda self, item: { | |
k: v | |
for k, v in ( | |
(k, self(self, v)) if isinstance(v, dict) else (k, v) | |
for k, v in item.items() | |
) | |
if v is not None and v != {} | |
} | |
) | |
payload = deep_clean( | |
{ | |
"pythonVersion": python_ver, | |
"imageKind": image_kind, | |
"pipRequirements": pypi_packages, | |
"condaMatchspecs": conda_packages, | |
"baseImage": {"imageReference": base_image}, | |
} | |
) | |
return self.aws_iam_invoker(payload) | |
def aws_iam_invoker(self, payload): | |
try: | |
# AWS_IAM requires a signed request to be made | |
headers = {"Content-Type": "application/json"} | |
payload = json.dumps(payload) | |
import boto3 | |
from botocore.auth import SigV4Auth | |
from botocore.awsrequest import AWSRequest | |
from botocore.exceptions import BotoCoreError, NoCredentialsError | |
import urllib | |
session = boto3.Session() | |
credentials = session.get_credentials().get_frozen_credentials() | |
# credits to https://github.com/boto/botocore/issues/1784#issuecomment-659132830, | |
# We need to jump through some hoops when calling the endpoint with IAM auth | |
# as botocore does not offer a direct utility for signing arbitrary requests | |
req = AWSRequest("POST", self.url, headers, payload) | |
SigV4Auth( | |
credentials, service_name="lambda", region_name=session.region_name | |
).add_auth(req) | |
response = requests.post(req.url, data=req.body, headers=req.headers) | |
# Manually check the status code and raise an exception if it's an error | |
if response.status_code >= 400: | |
raise requests.exceptions.HTTPError( | |
f"HTTP {response.status_code} Error: {response.reason}\nResponse Body: {response.text}", | |
response=response | |
) | |
return response | |
except requests.exceptions.HTTPError as e: | |
raise e | |
except (requests.exceptions.RequestException, BotoCoreError, NoCredentialsError) as e: | |
# Handle specific errors related to requests and AWS services | |
raise Exception(f"Failed to invoke AWS service due to network or AWS error: {e}") | |
except Exception as e: | |
# Handle other possible exceptions | |
raise Exception(f"An unexpected error occurred: {e}") |
self.bake_image_for_step(step) | ||
echo("Environments are ready!") | ||
if self.steps_to_delegate: | ||
# TODO: add debug echo to output steps that required a conda environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments too
self.delegate.validate_environment(echo, self.datastore_type) | ||
self.delegate.init_environment(echo, self.steps_to_delegate) | ||
|
||
def bake_image_for_step(self, step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple steps might share the same image - maybe we can check if that is the case and only bake unique images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the images are baked in sequence so the consecutive steps with identical packages should be hitting the cache at this point, skipping the baking. previously the caching was happening inside the fast_bakery, but moved to the environment side now so it should be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than baking in sequence, we can perhaps bake them in parallel - that should save precious seconds
metaflow/plugins/aws/batch/batch.py
Outdated
@@ -23,6 +24,9 @@ | |||
DEFAULT_SECRETS_BACKEND_TYPE, | |||
AWS_SECRETS_MANAGER_DEFAULT_REGION, | |||
S3_SERVER_SIDE_ENCRYPTION, | |||
FAST_BAKERY_URL, | |||
FAST_BAKERY_AUTH, | |||
FAST_BAKERY_TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can avoid introducing these as env vars here and in @kubernetes with the expectation that using fast bakery may involve using @secrets
or @environment
- to configure the URL and auth. The client can try to read from env vars unless explicitly configured with the URL and auth?
# a conda decorator always exists alongside pypi so this needs to be accounted for | ||
dependency_deco = pypi_deco if pypi_deco is not None else conda_deco | ||
if dependency_deco is not None: | ||
python = dependency_deco.attributes["python"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python comes from conda_deco
@@ -65,7 +65,7 @@ def validate_environment(self, echo, datastore_type): | |||
micromamba = Micromamba() | |||
self.solvers = {"conda": micromamba, "pypi": Pip(micromamba)} | |||
|
|||
def init_environment(self, echo): | |||
def init_environment(self, echo, only_steps=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romain-intel I'd like your thoughts on this mechanism.
The reason we need something akin to this is to support the environment fallback logic in docker_environment.py
.
- docker env gathers a list of steps that it can not act on (conda/pypi but not executing remotely)
- for the incompatible steps we want to fallback to the conda implementation, and initialize accordingly
- the only_steps is meant to skip those steps that we were able to bake a docker image for
Would this be a good addition to the environment init flow. Any better alternatives to this?
@@ -64,7 +64,7 @@ def step_init(self, flow, graph, step, decos, environment, flow_datastore, logge | |||
# one here. We can consider introducing a UX where @pypi is able to consume | |||
# poetry.lock files in the future. | |||
|
|||
_supported_virtual_envs = ["conda"] # , "venv"] | |||
_supported_virtual_envs = ["conda", "docker"] # , "venv"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romain-intel this relates to discussions on decorator improvements. Should this be reliably extendable through extensions?
Main motivation is that this whole imagebakery feature could be a standalone extension, except for the few additions it needs to make to the OSS PyPI plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_supported_virtual_envs
is just an implementation detail here, but what I'm pitching is adding a mechanism where a newly introduced environment could add itself to a whitelist for known decorators.
the changes allow an extension to implement def conda_supported_virtual_envs():
return ["docker"] in |
also possible to fallback to a Conda environment for steps that can not be handled by another env implementation: def init_environment(self, echo):
self.delegate = CondaEnvironment(self.flow)
...
if self.steps_to_delegate:
self.delegate.validate_environment(echo, self.datastore_type)
self.delegate.init_environment(echo, self.steps_to_delegate) # skips steps that were already handled. |
Adds support for on demand Docker images. Open for discussion
open todo's
conda
environment when executing locally with--environment docker