vLLM + embeddings

Deploy a full OpenAI API with vLLM that supports all embedding models

🔥 News :

upgrade python version to 3.12
add HUGGING_FACE_HUB_TOKEN variable for private model on HuggingFace
new architecture for more flexiblity !

vLLM is one of the state of the art libraries for deploying a Large Language Model (LLM) and its API with better generation performance. However, vLLM does not currently support all embeddings models for endpoint /v1/embeddings, although it can be used to deploy an API according to OpenAI conventions (see this discussion).

This repository makes it easy to add the /v1/embeddings endpoint by deploying an embedding model with HuggingFace Text Embeddings Inference (TEI) and serves it all on a single port. The aim of this repository is to have a complete API that's very light, easy to use and maintain !

How it works ?

API offer the following OpenAI endpoints:

/health
/v1/models
/v1/chat/completions
/v1/completions
/v1/embeddings

You can access the other vLLM API endpoints:

/vllm/health
/vllm/version

And the HuggingFace Text Embeddings Inference API endpoints :

/tei/decode
/tei/embed
/tei/health
/tei/embed_app
/tei/embed_sparse
/tei/embeddings
/tei/info
/tei/metrics
/tei/predict
/tei/rerank
/tei/tokenize
/tei/vertex

⚠️ You can't access of the vLLM and TEI API swaggers. The swagger UI of TEI is available at official documentation.

Models

Currently, this architecture support almost all LLM and embeddings models. The return of the /v1/models endpoint adds a new "type" key which takes the value "text-generation" or "text-embeddings-inference" depending on the nature of the model (language or embeddings). These values correspond to the label given to models on Huggingface. Example :

{
    "object": "list", 
    "data": [
        {
            "model": < language model >,
            "type": "text-generation",
            ...
        },
        {
            "model": < embeddings model >,
            "type": "text-embeddings-inference",
            ...
        }
    ]
}

Installation

First, configure a .env file or modify the .env.example file in this repository. For more informations about the configuration, please refer to the configuration section.

Then, run the containers with Docker compose :

docker compose --env-file env.example up --detach

Configuration

variable	values
EMBEDDINGS_HF_REPO_ID	HuggingFace repository ID of the embeddings model. Please refer to HuggingFace Text Embeddings Inference documentation to find supported models.
LLM_HF_REPO_ID	HuggingFace repository ID of the LLM model. Please refer to vLLM documentation to find supported models.
TEI_ARGS	Arguments for Text Embeddings Inference (format: --arg1 --arg2 ). Please refer to HuggingFace Text Embeddings Inference documentation for more information.
VLLM_ARGS	Arguments for vLLM (format: --arg1 --arg2 ). Please refer to vLLM documentation for more information.
HUGGING_FACE_HUB_TOKEN	HuggingFace API token for private model on HuggingFace Hub.
API_KEY	API key for protect your model.

🦜 Lanchain integration

You can use the deployed API with Langchain to create embedding vectors for your vector store. For example:

from langchain_community.embeddings import HuggingFaceHubEmbeddings

embeddings = HuggingFaceHubEmbeddings(model=f"http://localhost:8080/tei/")

🔦 Tests

python tests.py --llm-hf-repo-id TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --embeddings-hf-repo-id intfloat/e5-small --debug

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
fastapi		fastapi
nginx		nginx
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM + embeddings

Deploy a full OpenAI API with vLLM that supports all embedding models

How it works ?

Models

Installation

Configuration

🦜 Lanchain integration

🔦 Tests

About

Releases

Packages

Contributors 2

Languages

License

leoguillaume/vLLMembeddings

Folders and files

Latest commit

History

Repository files navigation

vLLM + embeddings

Deploy a full OpenAI API with vLLM that supports all embedding models

How it works ?

Models

Installation

Configuration

🦜 Lanchain integration

🔦 Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages