Releases: bentoml/OpenLLM
v0.5.7
Installation
pip install openllm==0.5.7
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.7
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
Full Changelog: v0.5.6...v0.5.7
OpenLLM: v0.5
OpenLLM has undergone a significant upgrade in its v0.5 release to enhance compatibility with the BentoML 1.2 SDK. The CLI has also been streamlined to focus on delivering the most easy-to-use and reliable experience for deploying open-source LLMs to production. However, version 0.5 introduces breaking changes.
Breaking changes, and the reason why.
After releasing version 0.4, we realized that while OpenLLM offers a high degree of flexibility and power to users, they encountered numerous issues when attempting to deploy these models. OpenLLM had been trying to accomplish a lot by providing support for different backends (mainly PyTorch for CPU inference and vLLM for GPU inference) and accelerators. Although this provided users with the option to quickly test on their local machines, we discovered that this brought a lot of confusion when running OpenLLM locally versus the cloud. The difference between local and cloud deployment made it difficult for users to understand and control the packaged Bento to behave correctly on the cloud.
The motivation for 0.5 is to focus on cloud deployment. Cloud deployments often focus on high throughput and high concurrency serving, and GPU is the most common choice of hardware for achieving high throughput and concurrency serving. Therefore, we simplified backend support to just vLLM which is the most suitable and reliable for serving LLM on GPU on the cloud.
Architecture changes and SDK.
For version 0.5, we have decided to reduce the scope and support the backend that yields the most performance (in this case, vLLM). This means that pip install openllm will also depend on vLLM. In other words, we will currently pause our support for CPU going forward.
All interactions with OpenLLM servers going forward should be done through clients (i.e., BentoML's Clients, OpenAI, etc.).
CLI
CLI has now been simplified to openllm start
and openllm build
HuggingFace models
openllm start
openllm start
will continue to accept HuggingFace model id for supported model architectures:
openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code
For any models that requires remote code execution, one should pass in
--trust-remote-code
openllm start
will also accept serving from local path directly. Make sure to also pass in --trust-remote-code
if you wish to use with openllm start
openllm start path/to/custom-phi-instruct --trust-remote-code
openllm build
In previous versions, OpenLLM would copy the local cache of the models into the generated Bento store, resulting in having two copies of the models on users’ machine. From v0.5 going forward, models won't be packaged with the Bento and will be downloaded into Hugging Face cache first time on deployment.
openllm build microsoft/Phi-3-mini-4k-instruct --trust-remote-code
Successfully built Bento 'microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83'.
██████╗ ██████╗ ███████╗███╗ ██╗██╗ ██╗ ███╗ ███╗
██╔═══██╗██╔══██╗██╔════╝████╗ ██║██║ ██║ ████╗ ████║
██║ ██║██████╔╝█████╗ ██╔██╗ ██║██║ ██║ ██╔████╔██║
██║ ██║██╔═══╝ ██╔══╝ ██║╚██╗██║██║ ██║ ██║╚██╔╝██║
╚██████╔╝██║ ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝.
📖 Next steps:
☁️ Deploy to BentoCloud:
$ bentoml deploy microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 -n ${DEPLOYMENT_NAME}
☁️ Update existing deployment on BentoCloud:
$ bentoml deployment update --bento microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 ${DEPLOYMENT_NAME}
🐳 Containerize BentoLLM:
$ bentoml containerize microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 --opt progress=plain
For quantized models, make sure to also pass in the --quantize
flag during build
openllm build casperhansen/llama-3-70b-instruct-awq --quantize awq
See openllm build --help
for more information
Private models
openllm start
For private models, we recommend users to save it to [BentoML’s Model store](https://docs.bentoml.com/en/latest/guides/model-store.html#model-store) first before using openllm start
:
with bentoml.models.create(name="my-private-models") as model:
PrivateTrainedModel.save_pretrained(model.path)
MyTokenizer.save_pretrained(model.path)
Note: Make sure to also save your tokenizer in this bentomodel
You can then pass in the private model name directly to openllm start
openllm start my-private-models
openllm build
Similar to openllm start
, openllm build
will only accept private models from BentoML’s model store:
openllm build my-private-models
What's next?
Currently, OpenAI's compatibility will only have the /chat/completions
and /models
endpoints supported. We will continue bringing /completions
as well as function calling support soon, so stay tuned.
Thank you for your continued support and trust in us. We would love to hear more of your feedback on the releases.
v0.5.5
Installation
pip install openllm==0.5.5
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.5
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- feat(models): command-r by @aarnphm in #1005
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #1007
- chore(deps): bump taiki-e/install-action from 2.33.34 to 2.34.0 by @dependabot in #1006
Full Changelog: v0.5.4...v0.5.5
v0.5.4
Installation
pip install openllm==0.5.4
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.4
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
Full Changelog: v0.5.3...v0.5.4
v0.5.3
Installation
pip install openllm==0.5.3
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.3
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
Full Changelog: v0.5.2...v0.5.3
v0.5.2
Installation
pip install openllm==0.5.2
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.2
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
Full Changelog: v0.5.1...v0.5.2
v0.5.1
Installation
pip install openllm==0.5.1
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.1
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
Full Changelog: v0.5.0...v0.5.1
v0.5.0-alpha.15
Installation
pip install openllm==0.5.0-alpha.15
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.0-alpha.15
Usage
All available models: openllm models
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.5.0-alpha.15 start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- chore(deps): bump docker/setup-buildx-action from 3.0.0 to 3.2.0 by @dependabot in #941
- chore(deps): bump github/codeql-action from 3.24.3 to 3.24.9 by @dependabot in #939
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #942
- fix(compat): use annotated type from
typing_compat
by @rudeigerc in #943 - docs: Update high-level messaging by @Sherlock113 in #949
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #947
- chore(deps): bump aquasecurity/trivy-action from 0.18.0 to 0.19.0 by @dependabot in #946
- chore(deps): bump taiki-e/install-action from 2.27.9 to 2.32.9 by @dependabot in #945
- Update README.md by @parano in #964
- chore(deps): bump taiki-e/install-action from 2.32.9 to 2.33.9 by @dependabot in #970
- chore(deps): bump sigstore/cosign-installer from 3.4.0 to 3.5.0 by @dependabot in #954
- chore(deps): bump docker/metadata-action from 5.5.0 to 5.5.1 by @dependabot in #956
- chore(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #955
- chore(deps): bump pypa/gh-action-pypi-publish from 1.8.11 to 1.8.14 by @dependabot in #958
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #959
- fix: update correct CompletionOutput object by @aarnphm in #973
- chore(deps): bump docker/build-push-action from 5.1.0 to 5.3.0 by @dependabot in #979
- chore(deps): bump docker/login-action from 3.0.0 to 3.1.0 by @dependabot in #978
- chore(deps): bump github/codeql-action from 3.24.9 to 3.25.3 by @dependabot in #977
- chore(deps): bump docker/setup-buildx-action from 3.2.0 to 3.3.0 by @dependabot in #975
- fix: make sure to respect additional parameters parse by @aarnphm in #981
- chore(deps): bump peter-evans/create-pull-request from 5.0.2 to 6.0.5 by @dependabot in #976
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #980
- chore(deps): bump rlespinasse/github-slug-action from 4.4.1 to 4.5.0 by @dependabot in #988
- chore(deps): bump softprops/action-gh-release from 1 to 2 by @dependabot in #987
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #989
- chore(deps): bump taiki-e/install-action from 2.33.9 to 2.33.22 by @dependabot in #985
- chore(deps): bump actions/checkout from 4.1.1 to 4.1.5 by @dependabot in #984
- chore(deps): bump next from 13.4.8 to 14.1.1 by @dependabot in #983
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #994
- chore(deps): bump actions/checkout from 4.1.5 to 4.1.6 by @dependabot in #993
- chore(deps): bump github/codeql-action from 3.25.3 to 3.25.5 by @dependabot in #992
- chore(deps): bump aquasecurity/trivy-action from 0.19.0 to 0.20.0 by @dependabot in #991
- fix(docs): update correct BentoML links by @dennisrall in #995
- tests: add additional basic testing by @aarnphm in #982
- infra: prepare 0.5 releases by @aarnphm in #996
- chore(deps): bump actions/upload-artifact from 3.1.3 to 4.3.3 by @dependabot in #986
- chore(deps): bump actions/download-artifact from 3.0.2 to 4.1.7 by @dependabot in #990
- chore(qol): update CLI options and performance upgrade for build cache by @aarnphm in #997
- feat(ci): running CI on paperspace by @aarnphm in #998
- chore(deps): bump taiki-e/install-action from 2.33.22 to 2.33.34 by @dependabot in #1000
New Contributors
- @rudeigerc made their first contribution in #943
- @dennisrall made their first contribution in #995
Full Changelog: v0.5.0-alpha.1...v0.5.0-alpha.15
v0.5.0
Installation
pip install openllm==0.5.0
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.5.0
Usage
To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta
Find more information about this release in the CHANGELOG.md
What's Changed
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #870
- chore(deps): bump taiki-e/install-action from 2.25.9 to 2.26.18 by @dependabot in #899
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #909
- chore(deps): bump github/codeql-action from 3.23.1 to 3.24.3 by @dependabot in #908
- chore(deps): bump sigstore/cosign-installer from 3.3.0 to 3.4.0 by @dependabot in #907
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #931
- feat: 1.2 APIs by @aarnphm in #821
- chore(deps): bump taiki-e/install-action from 2.26.18 to 2.27.9 by @dependabot in #920
- chore(deps): bump next from 13.4.8 to 13.5.1 by @dependabot in #912
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #935
- chore(deps): bump marocchino/sticky-pull-request-comment from 2.8.0 to 2.9.0 by @dependabot in #933
- chore(deps): bump aquasecurity/trivy-action from 0.16.1 to 0.18.0 by @dependabot in #932
- chore(deps): bump docker/setup-buildx-action from 3.0.0 to 3.2.0 by @dependabot in #941
- chore(deps): bump github/codeql-action from 3.24.3 to 3.24.9 by @dependabot in #939
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #942
- fix(compat): use annotated type from
typing_compat
by @rudeigerc in #943 - docs: Update high-level messaging by @Sherlock113 in #949
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #947
- chore(deps): bump aquasecurity/trivy-action from 0.18.0 to 0.19.0 by @dependabot in #946
- chore(deps): bump taiki-e/install-action from 2.27.9 to 2.32.9 by @dependabot in #945
- Update README.md by @parano in #964
- chore(deps): bump taiki-e/install-action from 2.32.9 to 2.33.9 by @dependabot in #970
- chore(deps): bump sigstore/cosign-installer from 3.4.0 to 3.5.0 by @dependabot in #954
- chore(deps): bump docker/metadata-action from 5.5.0 to 5.5.1 by @dependabot in #956
- chore(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #955
- chore(deps): bump pypa/gh-action-pypi-publish from 1.8.11 to 1.8.14 by @dependabot in #958
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #959
- fix: update correct CompletionOutput object by @aarnphm in #973
- chore(deps): bump docker/build-push-action from 5.1.0 to 5.3.0 by @dependabot in #979
- chore(deps): bump docker/login-action from 3.0.0 to 3.1.0 by @dependabot in #978
- chore(deps): bump github/codeql-action from 3.24.9 to 3.25.3 by @dependabot in #977
- chore(deps): bump docker/setup-buildx-action from 3.2.0 to 3.3.0 by @dependabot in #975
- fix: make sure to respect additional parameters parse by @aarnphm in #981
- chore(deps): bump peter-evans/create-pull-request from 5.0.2 to 6.0.5 by @dependabot in #976
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #980
- chore(deps): bump rlespinasse/github-slug-action from 4.4.1 to 4.5.0 by @dependabot in #988
- chore(deps): bump softprops/action-gh-release from 1 to 2 by @dependabot in #987
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #989
- chore(deps): bump taiki-e/install-action from 2.33.9 to 2.33.22 by @dependabot in #985
- chore(deps): bump actions/checkout from 4.1.1 to 4.1.5 by @dependabot in #984
- chore(deps): bump next from 13.4.8 to 14.1.1 by @dependabot in #983
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #994
- chore(deps): bump actions/checkout from 4.1.5 to 4.1.6 by @dependabot in #993
- chore(deps): bump github/codeql-action from 3.25.3 to 3.25.5 by @dependabot in #992
- chore(deps): bump aquasecurity/trivy-action from 0.19.0 to 0.20.0 by @dependabot in #991
- fix(docs): update correct BentoML links by @dennisrall in #995
- tests: add additional basic testing by @aarnphm in #982
- infra: prepare 0.5 releases by @aarnphm in #996
- chore(deps): bump actions/upload-artifact from 3.1.3 to 4.3.3 by @dependabot in #986
- chore(deps): bump actions/download-artifact from 3.0.2 to 4.1.7 by @dependabot in #990
- chore(qol): update CLI options and performance upgrade for build cache by @aarnphm in #997
- feat(ci): running CI on paperspace by @aarnphm in #998
- chore(deps): bump taiki-e/install-action from 2.33.22 to 2.33.34 by @dependabot in #1000
- ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #1002
New Contributors
- @rudeigerc made their first contribution in #943
- @dennisrall made their first contribution in #995
Full Changelog: v0.4.44...v0.5.0
v0.5.0-alpha.1
Release 0.5.0-alpha.1 [generated by GitHub Actions]