Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Custom Conda Channel #12411

Open
3 of 22 tasks
mparada-suva opened this issue Jun 19, 2024 · 3 comments
Open
3 of 22 tasks

[FR] Custom Conda Channel #12411

mparada-suva opened this issue Jun 19, 2024 · 3 comments
Labels
area/build Build and test infrastructure for MLflow area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docker Docker use anywhere, such as MLprojects and MLmodels enhancement New feature or request

Comments

@mparada-suva
Copy link

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

If I understand the code correctly, the environment information for a Model can be inferred from the Model Code itself and defaults to a Conda environment with the conda-forge channel.
I believe here is where it is set.
My Feature Request is the possibility to override this default channel.

Motivation

What is the use case for this feature?

Models can be built with dependencies which are not in conda-forge or the conda-forge channel may not be accessible at build time.

  • Custom packages could be located in an internal repository and not be publicly available on conda-forge.
  • For security and audit reasons the build step may not have access to the internet. The packages for the build would then be located in an internal repository or a proxy.

Why is this use case valuable to support for MLflow users in general?

I can image a lot of users have their builds not connected to the internet.

Why is this use case valuable to support for your project(s) or organization?

Security and audit reasons.

Why is it currently difficult to achieve this use case?

Currently we can overwrite the whole envirnonment.yaml file which includes not only replacing the channels but also all the dependencies. It would be nice if we could rely on mlflows capabilities to generate the dependencies in environment.yaml but with a predefined list of channels.

Details

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@mparada-suva mparada-suva added the enhancement New feature or request label Jun 19, 2024
@github-actions github-actions bot added area/build Build and test infrastructure for MLflow area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docker Docker use anywhere, such as MLprojects and MLmodels labels Jun 19, 2024
@harupy
Copy link
Member

harupy commented Jun 19, 2024

We could add an environment variable which defaults to conda-forge. If you want to use your own channels, you can set the environment variable.

Another option is add a new argument to log_model to specify conda channels, but I'm not a fan of this option because we already have conda_env argument and don't want to create another knob.

mlflow.xxx.log_model(..., conda_channels=["..."])

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

@mparada-suva
Copy link
Author

@harupy The environment variable option is what I was thinking about. Since I don't know the code base, I am unsure if this is the usual way here to allow more flexibility and also if there are some gotchas I don't know about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build and test infrastructure for MLflow area/deployments MLflow Deployments client APIs, server, and third-party Deployments integrations area/docker Docker use anywhere, such as MLprojects and MLmodels enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants