Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone vector db #178

Open
2 tasks done
MuhammadIshaq-AI opened this issue Jan 2, 2024 · 0 comments
Open
2 tasks done

Pinecone vector db #178

MuhammadIshaq-AI opened this issue Jan 2, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@MuhammadIshaq-AI
Copy link

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have ingested files in the vector db of pinecone index, i delete all the vectors from the db and when i query it still fetches it from the db, how it is possible?

Expected Behavior

Db is deleted but i am still fetching vectors from the index.

Steps To Reproduce

1- First i create a test index
2- After creating index, i ingest my data through this below script

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { CustomPDFLoader } from '@/utils/customPDFLoader';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';

/* Name of directory to retrieve your files from */
const filePath = 'new docs';

export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path),
});

const rawDocs = await directoryLoader.load();

// Extracting the file name using regular expressions and updating metadata
const processedDocs = rawDocs.map(doc => {
  const fileName = doc.metadata.source.match(/[^\\\/]+$/)?.[0] || doc.metadata.source;
  const modifiedMetadata = { ...doc.metadata, source: fileName };
  return { ...doc, metadata: modifiedMetadata };
});

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await textSplitter.splitDocuments(processedDocs);
console.log('split docs', docs);

console.log('creating vector store...');
/* Create and store the embeddings in the vectorStore */
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); // Change to your own index name

// Embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  namespace: PINECONE_NAME_SPACE,
  textKey: 'text',
});

} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};

(async () => {
await run();
console.log('ingestion complete');
})();

3- once i ingest my data, it creates a vector store and ingest doc files meta data in it

4- When i delete this index and calls the query it still fetches data from the vectordb, this is my chat.ts code when it fetches query

import type { NextApiRequest, NextApiResponse } from 'next';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { makeChain } from '@/utils/makechain';
import { pinecone } from '@/utils/pinecone-client';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';

interface SourceDocument {
pageContent: string;
metadata: {
'loc.lines.from': number;
'loc.lines.to': number;
pdf_numpages: number;
source: string;
};
}

export default async function handler(
req: NextApiRequest,
res: NextApiResponse
) {
const { question, history, username} = req.body;
let chatbotResponse = '';
let globalSourceDocs: SourceDocument[] = [];
const { file_url, flaskservice} = require('../../public/config.json');

console.log('question', question);
console.log("Your history is",history)
console.log("Your username is",username)

// Only accept post requests
if (req.method !== 'POST') {
res.status(405).json({ error: 'Method not allowed' });
return;
}

if (!question && (!history || history.length === 0)) {
// Start a new chat by clearing the history
return res.status(200).json({
text: 'What can I help you with now? ',
sourceDocuments: [],
});
}

if (!question) {
return res.status(400).json({ message: 'No question in the request' });
}

// OpenAI recommends replacing newlines with spaces for best results
const sanitizedQuestion = question.trim().replaceAll('\n', ' ');

try {
const index = pinecone.Index(PINECONE_INDEX_NAME);

/* create vectorstore */
const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings({}),
  {
    pineconeIndex: index,
    textKey: 'text',
    namespace: PINECONE_NAME_SPACE,
  },
);

// Create chain
const chain = makeChain(vectorStore);

// Ask a question using chat history
const response = await chain.call({
  question: sanitizedQuestion,
  chat_history: history || [],
});

chatbotResponse = response.text;

fetch (flaskservice,{
  method:'POST',
  headers:{
    'Content-Type':'application/json',
  
  },
  body:JSON.stringify({
    question:sanitizedQuestion,
    history:history,
    username:username,
    chatbot_response: chatbotResponse,

  }),

})
.then((response)=>response.json())
.then((response)=>{
  console.log('data stored in flask',response);
})
.catch((error)=>{
  console.error('Error storing in flask :',error);
})


globalSourceDocs = response.sourceDocuments;
chatbotResponse = response.text;
/*
console.log('Your response is ', chatbotResponse);
*/

if (globalSourceDocs.length > 0) {
  const fullPath = globalSourceDocs[0].metadata.source;
  const filename = fullPath.split('\\').pop();

  /*
  console.log('Your filenames are is', filename);

  
  */
  const fetchFileUrl=fetch(file_url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      file_name: filename,
    }),
  })
    .then((response2) => response2.json())  
    .then((response2) => {

      console.log("URL RECEVIED IS",response2.fileurl)

      
          return response2.fileurl; // Return the fileUrl value

        

    })
    .catch((error) => {
      console.error('Error fetching file URL:', error); // Handle the error
    });
    fetchFileUrl.then((fileUrl) => {

      // Update sourceDocuments with filename and fileUrl
      globalSourceDocs = globalSourceDocs.map((sourceDoc) => ({
        ...sourceDoc,
        metadata: {
          ...sourceDoc.metadata,
          source: filename, // Change source to filename
          fileUrl, // Add fileUrl to metadata
        },
      }));
      /*
      console.log('Updated sourceDocuments:', globalSourceDocs);
      */

      // Update the response with modified sourceDocuments
      const modifiedResponse = { ...response, sourceDocuments: globalSourceDocs };

      /*
      console.log('Modified response:', modifiedResponse);
      */

      res.status(200).json(modifiedResponse); // Send the modified response
    })
    .catch((error) => {
      console.error('Error:', error); // Print errors, if any
      res.status(500).json({ error: 'Something went wrong' }); // Return error response
    });
} else {
  // If no source documents found, return the original response
  console.log('No source documents found in the response.');
  res.status(200).json(response);
}

} catch (error: any) {
console.log('error', error);
res.status(500).json({ error: error.message || 'Something went wrong' });
}
}

Relevant log output

No response

Environment

Nodejs

Additional Context

No such

@MuhammadIshaq-AI MuhammadIshaq-AI added the bug Something isn't working label Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant