OpenLLM has undergone a significant upgrade in its v0.5 release to enhance compatibility with the BentoML 1.2 SDK. The CLI has also been streamlined to focus on delivering the most easy-to-use and reliable experience for deploying open-source LLMs to production. However, version 0.5 introduces breaking changes.

Breaking changes, and the reason why.

After releasing version 0.4, we realized that while OpenLLM offers a high degree of flexibility and power to users, they encountered numerous issues when attempting to deploy these models. OpenLLM had been trying to accomplish a lot by providing support for different backends (mainly PyTorch for CPU inference and vLLM for GPU inference) and accelerators. Although this provided users with the option to quickly test on their local machines, we discovered that this brought a lot of confusion when running OpenLLM locally versus the cloud. The difference between local and cloud deployment made it difficult for users to understand and control the packaged Bento to behave correctly on the cloud.

The motivation for 0.5 is to focus on cloud deployment. Cloud deployments often focus on high throughput and high concurrency serving, and GPU is the most common choice of hardware for achieving high throughput and concurrency serving. Therefore, we simplified backend support to just vLLM which is the most suitable and reliable for serving LLM on GPU on the cloud.

Architecture changes and SDK.

For version 0.5, we have decided to reduce the scope and support the backend that yields the most performance (in this case, vLLM). This means that pip install openllm will also depend on vLLM. In other words, we will currently pause our support for CPU going forward.
All interactions with OpenLLM servers going forward should be done through clients (i.e., BentoML's Clients, OpenAI, etc.).

CLI

CLI has now been simplified to openllm start and openllm build

HuggingFace models

openllm start

openllm start will continue to accept HuggingFace model id for supported model architectures:

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

For any models that requires remote code execution, one should pass in --trust-remote-code

openllm start will also accept serving from local path directly. Make sure to also pass in --trust-remote-code if you wish to use with openllm start

openllm start path/to/custom-phi-instruct --trust-remote-code

openllm build

In previous versions, OpenLLM would copy the local cache of the models into the generated Bento store, resulting in having two copies of the models on users’ machine. From v0.5 going forward, models won't be packaged with the Bento and will be downloaded into Hugging Face cache first time on deployment.

openllm build microsoft/Phi-3-mini-4k-instruct --trust-remote-code

Successfully built Bento 'microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83'.

 ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
██╔═══██╗██╔══██╗██╔════╝████╗  ██║██║     ██║     ████╗ ████║
██║   ██║██████╔╝█████╗  ██╔██╗ ██║██║     ██║     ██╔████╔██║
██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║██║     ██║     ██║╚██╔╝██║
╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
 ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝.

📖 Next steps:
☁️  Deploy to BentoCloud:
  $ bentoml deploy microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 -n ${DEPLOYMENT_NAME}
☁️  Update existing deployment on BentoCloud:
  $ bentoml deployment update --bento microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 ${DEPLOYMENT_NAME}
🐳 Containerize BentoLLM:
  $ bentoml containerize microsoft--phi-3-mini-4k-instruct-service:5fa34190089f0ee40f9cce3cafc396b89b2e5e83 --opt progress=plain

For quantized models, make sure to also pass in the --quantize flag during build

openllm build casperhansen/llama-3-70b-instruct-awq --quantize awq

See openllm build --help for more information

Private models

openllm start

For private models, we recommend users to save it to [BentoML’s Model store](https://docs.bentoml.com/en/latest/guides/model-store.html#model-store) first before using openllm start:

with bentoml.models.create(name="my-private-models") as model:
	PrivateTrainedModel.save_pretrained(model.path)
	MyTokenizer.save_pretrained(model.path)

Note: Make sure to also save your tokenizer in this bentomodel

You can then pass in the private model name directly to openllm start

openllm start my-private-models

openllm build

Similar to openllm start, openllm build will only accept private models from BentoML’s model store:

openllm build my-private-models

What's next?

Currently, OpenAI's compatibility will only have the /chat/completions and /models endpoints supported. We will continue bringing /completions as well as function calling support soon, so stay tuned.

Thank you for your continued support and trust in us. We would love to hear more of your feedback on the releases.

Assets 15

03 Jun 22:22

github-actions

v0.5.5

10a6030

v0.5.5

Installation

pip install openllm==0.5.5

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.5

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

feat(models): command-r by @aarnphm in #1005
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #1007
chore(deps): bump taiki-e/install-action from 2.33.34 to 2.34.0 by @dependabot in #1006

Full Changelog: v0.5.4...v0.5.5

Contributors

dependabot, aarnphm, and pre-commit-ci

Assets 15

01 Jun 00:45

github-actions

v0.5.4

9649073

v0.5.4

Installation

pip install openllm==0.5.4

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.4

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

feat(API): add light support for batch inference by @aarnphm in #1004

Full Changelog: v0.5.3...v0.5.4

Contributors

aarnphm

Assets 15

30 May 21:43

github-actions

v0.5.3

162458f

v0.5.3

Installation

pip install openllm==0.5.3

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.3

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.2...v0.5.3

Assets 15

29 May 04:51

github-actions

v0.5.2

49908ec

v0.5.2

Installation

pip install openllm==0.5.2

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.2

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.1...v0.5.2

Assets 15

29 May 02:54

github-actions

v0.5.1

5ff77d1

v0.5.1

Installation

pip install openllm==0.5.1

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.1

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

Full Changelog: v0.5.0...v0.5.1

Assets 15

27 May 18:00

github-actions

v0.5.0-alpha.15

a4a6060

v0.5.0-alpha.15 Pre-release

Pre-release

Installation

pip install openllm==0.5.0-alpha.15

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.0-alpha.15

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.5.0-alpha.15 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

chore(deps): bump docker/setup-buildx-action from 3.0.0 to 3.2.0 by @dependabot in #941
chore(deps): bump github/codeql-action from 3.24.3 to 3.24.9 by @dependabot in #939
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #942
fix(compat): use annotated type from typing_compat by @rudeigerc in #943
docs: Update high-level messaging by @Sherlock113 in #949
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #947
chore(deps): bump aquasecurity/trivy-action from 0.18.0 to 0.19.0 by @dependabot in #946
chore(deps): bump taiki-e/install-action from 2.27.9 to 2.32.9 by @dependabot in #945
Update README.md by @parano in #964
chore(deps): bump taiki-e/install-action from 2.32.9 to 2.33.9 by @dependabot in #970
chore(deps): bump sigstore/cosign-installer from 3.4.0 to 3.5.0 by @dependabot in #954
chore(deps): bump docker/metadata-action from 5.5.0 to 5.5.1 by @dependabot in #956
chore(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #955
chore(deps): bump pypa/gh-action-pypi-publish from 1.8.11 to 1.8.14 by @dependabot in #958
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #959
fix: update correct CompletionOutput object by @aarnphm in #973
chore(deps): bump docker/build-push-action from 5.1.0 to 5.3.0 by @dependabot in #979
chore(deps): bump docker/login-action from 3.0.0 to 3.1.0 by @dependabot in #978
chore(deps): bump github/codeql-action from 3.24.9 to 3.25.3 by @dependabot in #977
chore(deps): bump docker/setup-buildx-action from 3.2.0 to 3.3.0 by @dependabot in #975
fix: make sure to respect additional parameters parse by @aarnphm in #981
chore(deps): bump peter-evans/create-pull-request from 5.0.2 to 6.0.5 by @dependabot in #976
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #980
chore(deps): bump rlespinasse/github-slug-action from 4.4.1 to 4.5.0 by @dependabot in #988
chore(deps): bump softprops/action-gh-release from 1 to 2 by @dependabot in #987
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #989
chore(deps): bump taiki-e/install-action from 2.33.9 to 2.33.22 by @dependabot in #985
chore(deps): bump actions/checkout from 4.1.1 to 4.1.5 by @dependabot in #984
chore(deps): bump next from 13.4.8 to 14.1.1 by @dependabot in #983
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #994
chore(deps): bump actions/checkout from 4.1.5 to 4.1.6 by @dependabot in #993
chore(deps): bump github/codeql-action from 3.25.3 to 3.25.5 by @dependabot in #992
chore(deps): bump aquasecurity/trivy-action from 0.19.0 to 0.20.0 by @dependabot in #991
fix(docs): update correct BentoML links by @dennisrall in #995
tests: add additional basic testing by @aarnphm in #982
infra: prepare 0.5 releases by @aarnphm in #996
chore(deps): bump actions/upload-artifact from 3.1.3 to 4.3.3 by @dependabot in #986
chore(deps): bump actions/download-artifact from 3.0.2 to 4.1.7 by @dependabot in #990
chore(qol): update CLI options and performance upgrade for build cache by @aarnphm in #997
feat(ci): running CI on paperspace by @aarnphm in #998
chore(deps): bump taiki-e/install-action from 2.33.22 to 2.33.34 by @dependabot in #1000

New Contributors

@rudeigerc made their first contribution in #943
@dennisrall made their first contribution in #995

Full Changelog: v0.5.0-alpha.1...v0.5.0-alpha.15

Contributors

parano, rudeigerc, and 5 other contributors

Assets 15

27 May 18:29

github-actions

v0.5.0

2314a36

v0.5.0 Pre-release

Pre-release

Installation

pip install openllm==0.5.0

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.5.0

Usage

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #870
chore(deps): bump taiki-e/install-action from 2.25.9 to 2.26.18 by @dependabot in #899
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #909
chore(deps): bump github/codeql-action from 3.23.1 to 3.24.3 by @dependabot in #908
chore(deps): bump sigstore/cosign-installer from 3.3.0 to 3.4.0 by @dependabot in #907
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #931
feat: 1.2 APIs by @aarnphm in #821
chore(deps): bump taiki-e/install-action from 2.26.18 to 2.27.9 by @dependabot in #920
chore(deps): bump next from 13.4.8 to 13.5.1 by @dependabot in #912
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #935
chore(deps): bump marocchino/sticky-pull-request-comment from 2.8.0 to 2.9.0 by @dependabot in #933
chore(deps): bump aquasecurity/trivy-action from 0.16.1 to 0.18.0 by @dependabot in #932
chore(deps): bump docker/setup-buildx-action from 3.0.0 to 3.2.0 by @dependabot in #941
chore(deps): bump github/codeql-action from 3.24.3 to 3.24.9 by @dependabot in #939
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #942
fix(compat): use annotated type from typing_compat by @rudeigerc in #943
docs: Update high-level messaging by @Sherlock113 in #949
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #947
chore(deps): bump aquasecurity/trivy-action from 0.18.0 to 0.19.0 by @dependabot in #946
chore(deps): bump taiki-e/install-action from 2.27.9 to 2.32.9 by @dependabot in #945
Update README.md by @parano in #964
chore(deps): bump taiki-e/install-action from 2.32.9 to 2.33.9 by @dependabot in #970
chore(deps): bump sigstore/cosign-installer from 3.4.0 to 3.5.0 by @dependabot in #954
chore(deps): bump docker/metadata-action from 5.5.0 to 5.5.1 by @dependabot in #956
chore(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #955
chore(deps): bump pypa/gh-action-pypi-publish from 1.8.11 to 1.8.14 by @dependabot in #958
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #959
fix: update correct CompletionOutput object by @aarnphm in #973
chore(deps): bump docker/build-push-action from 5.1.0 to 5.3.0 by @dependabot in #979
chore(deps): bump docker/login-action from 3.0.0 to 3.1.0 by @dependabot in #978
chore(deps): bump github/codeql-action from 3.24.9 to 3.25.3 by @dependabot in #977
chore(deps): bump docker/setup-buildx-action from 3.2.0 to 3.3.0 by @dependabot in #975
fix: make sure to respect additional parameters parse by @aarnphm in #981
chore(deps): bump peter-evans/create-pull-request from 5.0.2 to 6.0.5 by @dependabot in #976
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #980
chore(deps): bump rlespinasse/github-slug-action from 4.4.1 to 4.5.0 by @dependabot in #988
chore(deps): bump softprops/action-gh-release from 1 to 2 by @dependabot in #987
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #989
chore(deps): bump taiki-e/install-action from 2.33.9 to 2.33.22 by @dependabot in #985
chore(deps): bump actions/checkout from 4.1.1 to 4.1.5 by @dependabot in #984
chore(deps): bump next from 13.4.8 to 14.1.1 by @dependabot in #983
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #994
chore(deps): bump actions/checkout from 4.1.5 to 4.1.6 by @dependabot in #993
chore(deps): bump github/codeql-action from 3.25.3 to 3.25.5 by @dependabot in #992
chore(deps): bump aquasecurity/trivy-action from 0.19.0 to 0.20.0 by @dependabot in #991
fix(docs): update correct BentoML links by @dennisrall in #995
tests: add additional basic testing by @aarnphm in #982
infra: prepare 0.5 releases by @aarnphm in #996
chore(deps): bump actions/upload-artifact from 3.1.3 to 4.3.3 by @dependabot in #986
chore(deps): bump actions/download-artifact from 3.0.2 to 4.1.7 by @dependabot in #990
chore(qol): update CLI options and performance upgrade for build cache by @aarnphm in #997
feat(ci): running CI on paperspace by @aarnphm in #998
chore(deps): bump taiki-e/install-action from 2.33.22 to 2.33.34 by @dependabot in #1000
ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #1002

New Contributors

@rudeigerc made their first contribution in #943
@dennisrall made their first contribution in #995

Full Changelog: v0.4.44...v0.5.0

Contributors

parano, rudeigerc, and 5 other contributors

Assets 15

21 Mar 01:46

aarnphm

v0.5.0-alpha.1

12ac998

v0.5.0-alpha.1 Pre-release

Pre-release

Release 0.5.0-alpha.1 [generated by GitHub Actions]

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation

Usage

Breaking changes, and the reason why.

Architecture changes and SDK.

CLI

HuggingFace models

openllm start

openllm build

Private models

openllm start

openllm build

What's next?

Installation

Usage

What's Changed

Contributors

Installation

Usage

What's Changed

Contributors

Installation

Usage

Installation

Usage

Installation

Usage

Installation

Usage

What's Changed

New Contributors

Contributors

Installation

Usage

What's Changed

New Contributors

Contributors

Releases: bentoml/OpenLLM

v0.5.7

Installation

Usage

OpenLLM: v0.5

Breaking changes, and the reason why.

Architecture changes and SDK.

CLI

HuggingFace models

openllm start

openllm build

Private models

openllm start

openllm build

What's next?

v0.5.5

Installation

Usage

What's Changed

Contributors

v0.5.4

Installation

Usage

What's Changed

Contributors

v0.5.3

Installation

Usage

v0.5.2

Installation

Usage

v0.5.1

Installation

Usage

v0.5.0-alpha.15

Installation

Usage

What's Changed

New Contributors

Contributors

v0.5.0

Installation

Usage

What's Changed

New Contributors

Contributors

v0.5.0-alpha.1