Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RAG #23

Open
4binas opened this issue Mar 22, 2024 · 1 comment
Open

Adding RAG #23

4binas opened this issue Mar 22, 2024 · 1 comment

Comments

@4binas
Copy link

4binas commented Mar 22, 2024

Following the conversation in the Discord channel, I'm planning to begin working on the Retriever-Augmented Generation (RAG). My approach will involve using FAISS for the vector database and for extracting text from PDF files, I'll be using PyPDF.

Please let me know if these new dependencies are ok.

I am doing some tests to use image to image description models to include those in the text from PyPDF.

For the embeddings I am planing to use mixedbread-ai/mxbai-embed-large-v1, based on the model size and MTEB performance. Please let me know if there is another model that is preferred.

@braunale
Copy link
Collaborator

Hey @4binas that sounds great! I am totally fine with your choices and whenever you have a first version, feel free to open a pull request and then we can discuss any details for merging it into the existing project. If you like, please keep me up to date and if you have any questions about the current code base, just let me know.

Thank you very much for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants