Adding RAG #23

4binas · 2024-03-22T09:33:39Z

Following the conversation in the Discord channel, I'm planning to begin working on the Retriever-Augmented Generation (RAG). My approach will involve using FAISS for the vector database and for extracting text from PDF files, I'll be using PyPDF.

Please let me know if these new dependencies are ok.

I am doing some tests to use image to image description models to include those in the text from PyPDF.

For the embeddings I am planing to use mixedbread-ai/mxbai-embed-large-v1, based on the model size and MTEB performance. Please let me know if there is another model that is preferred.

braunale · 2024-03-25T14:32:51Z

Hey @4binas that sounds great! I am totally fine with your choices and whenever you have a first version, feel free to open a pull request and then we can discuss any details for merging it into the existing project. If you like, please keep me up to date and if you have any questions about the current code base, just let me know.

Thank you very much for contributing!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding RAG #23

Adding RAG #23

4binas commented Mar 22, 2024

braunale commented Mar 25, 2024

Adding RAG #23

Adding RAG #23

Comments

4binas commented Mar 22, 2024

braunale commented Mar 25, 2024