Ask Poddy

Ask Poddy (named after "Poddy", the RunPod bot on Discord) is a user-friendly RAG (Retrieval-Augmented Generation) web application designed to showcase the ease of setting up OpenAI-compatible APIs using open-source models running serverless on RunPod. Built with Next.js, React, Tailwind, Vercel AI SDK, and LangChain, it uses Meta-Llama-3-8B-Instruct as LLM and multilingual-e5-large-instruct for text embeddings.

This tutorial will guide you through deploying Ask Poddy in your environment, enabling it to answer questions related to RunPod effectively, by leveraging the open-source workers worker-vllm and worker-infinity-embedding.

Concept
Tutorial: Setting Up "Ask Poddy" in Your Environment

Concept

Ask Poddy is designed to demonstrate the integration of serverless OpenAI-compatible APIs with open-source models. The application runs locally (but it could also be deployed into the cloud), while the computational heavy lifting is handled by serverless endpoints on RunPod. This architecture allows seamless use of existing OpenAI-compatible tools and frameworks without needing to develop custom APIs.

Here's how RAG works in Ask Poddy:

User: Asks a question.
Vector Store: The question is sent to LangChain, which uses the worker-infinity-embedding endpoint to convert the question into an embedding using the multilingual-e5-large-instruct model.
Vector Store: Performs a similarity search to find relevant documents based on the question.
AI SDK: The retrieved documents and the user's question are sent to the worker-vllm endpoint.
worker-vllm: Generates an answer using the Meta-Llama-3-8B-Instruct model.
User: Receives the answer.

Tip

You can choose any of the supported models that come with vLLM.

Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

git installed
Node.js and npm installed
RunPod account

1. Clone the Repository

Clone the Ask Poddy repository and go into the cloned directory:

git clone https://github.com/blib-la/ask-poddy.git
cd ask-poddy

Clone the RunPod docs repository into ask-poddy/data/runpod-docs.

git clone https://github.com/runpod/docs.git ./data/runpod-docs

Note

The RunPod docs repository contains the RunPod documentation that Ask Poddy will use to answer questions.

Copy the img folder from ./data/runpod-docs/static/img to ./public

Note

This makes it possible for Ask Poddy to include images from the RunPod documentation.

2. Install Dependencies

Navigate to the ask-poddy directory and install the dependencies:

npm install

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

Create two network volumes with 15GB storage each in the same data center as the serverless endpoints.
- Volume for embeddings: infinity_embeddings
- Volume for LLM: vllm_llama3

Note

Using network volumes ensures that the models and embeddings are stored persistently, allowing for faster subsequent requests as the data does not need to be downloaded or recreated each time.

3.2 Worker-vLLM Endpoint

Follow the guide for setting up the vLLM endpoint, but make sure to use the meta-llama/Meta-Llama-3-8B-Instruct model instead of the one mentioned in the guide. And also make sure to select the network volume vllm_llama3 when creating the endpoint.

Tip

The worker is using worker-vllm.

3.3 Worker-Infinity-Embedding Endpoint

Create a new template
Use the Docker image runpod/worker-infinity-embedding:stable-cuda12.1.0 from worker-infinity-embedding and set the environment variable MODEL_NAMES to intfloat/multilingual-e5-large-instruct.
Create a serverless endpoint and make sure to select the network volume infinity_embeddings.

4. Configure Environment Variables

Generate your RunPod API key
Find the endpoint IDs underneath the deployed serverless endpoints.

Create your .env.local based on .env.local.example or by creating a file with the following variables:

RUNPOD_API_KEY=your_runpod_api_key
RUNPOD_ENDPOINT_ID_VLLM=your_vllm_endpoint_id
RUNPOD_ENDPOINT_ID_EMBEDDING=your_embedding_endpoint_id

5. Populate the Vector Store

To populate the vector store, run the following command:

npm run populate

Note

The first run will take some time as the worker downloads the embeddings model (multilingual-e5-large-instruct). Subsequent requests will use the downloaded model stored in the network volume.

This command reads all markdown documents from the ask-poddy/data/runpod-docs/ folder, creates embeddings using the embedding endpoint running on RunPod, and stores these embeddings in the local vector store:

Documents: The markdown documents from the ask-poddy/data/runpod-docs/ folder are read by LangChain.
Chunks: LangChain converts the documents into smaller chunks, which are then sent to the worker-infinity-embedding endpoint.
worker-infinity-embedding: Receives chunks, generates embeddings using the multilingual-e5-large-instruct model, and sends them back.
Vector Store: LangChain saves these embeddings in the local vector store (HNSWlib).

Tip

A vector store is a database that stores embeddings (vector representations of text) to enable efficient similarity search. This is crucial for the RAG process as it allows the system to quickly retrieve relevant documents based on the user's question.

6. Start the Local Web Server

Start the local web server:

npm run dev

Open http://localhost:3000 to access the UI.

7. Ask Poddy

Now that everything is running, you can ask your RunPod-related question, like:

What is RunPod?
How do I create a serverless endpoint?
What are the benefits of using a network volume?
How can I become a host for the community cloud?
Can RunPod help my startup to get going?

Note

The first run will take some time as the worker downloads the LLM (Meta-Llama-3-8B-Instruct). Subsequent requests will use the downloaded model stored in the network volume.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
public		public
scripts		scripts
src		src
store		store
.editorconfig		.editorconfig
.env.local.example		.env.local.example
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ask Poddy

Concept

Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

3.2 Worker-vLLM Endpoint

3.3 Worker-Infinity-Embedding Endpoint

4. Configure Environment Variables

5. Populate the Vector Store

6. Start the Local Web Server

7. Ask Poddy

About

Releases

Packages

Languages

License

blib-la/ask-poddy

Folders and files

Latest commit

History

Repository files navigation

Ask Poddy

Concept

Tutorial: Setting Up "Ask Poddy" in Your Environment

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Set Up RunPod Serverless Endpoints

3.1 Network Volumes

3.2 Worker-vLLM Endpoint

3.3 Worker-Infinity-Embedding Endpoint

4. Configure Environment Variables

5. Populate the Vector Store

6. Start the Local Web Server

7. Ask Poddy

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages