Pinecone vector db #178

MuhammadIshaq-AI · 2024-01-02T06:08:36Z

Is this a new bug?

I believe this is a new bug
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have ingested files in the vector db of pinecone index, i delete all the vectors from the db and when i query it still fetches it from the db, how it is possible?

Expected Behavior

Db is deleted but i am still fetching vectors from the index.

Steps To Reproduce

1- First i create a test index
2- After creating index, i ingest my data through this below script

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { CustomPDFLoader } from '@/utils/customPDFLoader';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';

/* Name of directory to retrieve your files from */
const filePath = 'new docs';

export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path),
});

const rawDocs = await directoryLoader.load();

// Extracting the file name using regular expressions and updating metadata
const processedDocs = rawDocs.map(doc => {
  const fileName = doc.metadata.source.match(/[^\\\/]+$/)?.[0] || doc.metadata.source;
  const modifiedMetadata = { ...doc.metadata, source: fileName };
  return { ...doc, metadata: modifiedMetadata };
});

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await textSplitter.splitDocuments(processedDocs);
console.log('split docs', docs);

console.log('creating vector store...');
/* Create and store the embeddings in the vectorStore */
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); // Change to your own index name

// Embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  namespace: PINECONE_NAME_SPACE,
  textKey: 'text',
});

} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};

(async () => {
await run();
console.log('ingestion complete');
})();

3- once i ingest my data, it creates a vector store and ingest doc files meta data in it

4- When i delete this index and calls the query it still fetches data from the vectordb, this is my chat.ts code when it fetches query

import type { NextApiRequest, NextApiResponse } from 'next';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { makeChain } from '@/utils/makechain';
import { pinecone } from '@/utils/pinecone-client';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';

interface SourceDocument {
pageContent: string;
metadata: {
'loc.lines.from': number;
'loc.lines.to': number;
pdf_numpages: number;
source: string;
};
}

export default async function handler(
req: NextApiRequest,
res: NextApiResponse
) {
const { question, history, username} = req.body;
let chatbotResponse = '';
let globalSourceDocs: SourceDocument[] = [];
const { file_url, flaskservice} = require('../../public/config.json');

console.log('question', question);
console.log("Your history is",history)
console.log("Your username is",username)

// Only accept post requests
if (req.method !== 'POST') {
res.status(405).json({ error: 'Method not allowed' });
return;
}

if (!question && (!history || history.length === 0)) {
// Start a new chat by clearing the history
return res.status(200).json({
text: 'What can I help you with now? ',
sourceDocuments: [],
});
}

if (!question) {
return res.status(400).json({ message: 'No question in the request' });
}

// OpenAI recommends replacing newlines with spaces for best results
const sanitizedQuestion = question.trim().replaceAll('\n', ' ');

try {
const index = pinecone.Index(PINECONE_INDEX_NAME);

/* create vectorstore */
const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings({}),
  {
    pineconeIndex: index,
    textKey: 'text',
    namespace: PINECONE_NAME_SPACE,
  },
);

// Create chain
const chain = makeChain(vectorStore);

// Ask a question using chat history
const response = await chain.call({
  question: sanitizedQuestion,
  chat_history: history || [],
});

chatbotResponse = response.text;

fetch (flaskservice,{
  method:'POST',
  headers:{
    'Content-Type':'application/json',
  
  },
  body:JSON.stringify({
    question:sanitizedQuestion,
    history:history,
    username:username,
    chatbot_response: chatbotResponse,

  }),

})
.then((response)=>response.json())
.then((response)=>{
  console.log('data stored in flask',response);
})
.catch((error)=>{
  console.error('Error storing in flask :',error);
})


globalSourceDocs = response.sourceDocuments;
chatbotResponse = response.text;
/*
console.log('Your response is ', chatbotResponse);
*/

if (globalSourceDocs.length > 0) {
  const fullPath = globalSourceDocs[0].metadata.source;
  const filename = fullPath.split('\\').pop();

  /*
  console.log('Your filenames are is', filename);

  
  */
  const fetchFileUrl=fetch(file_url, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      file_name: filename,
    }),
  })
    .then((response2) => response2.json())  
    .then((response2) => {

      console.log("URL RECEVIED IS",response2.fileurl)

      
          return response2.fileurl; // Return the fileUrl value

        

    })
    .catch((error) => {
      console.error('Error fetching file URL:', error); // Handle the error
    });
    fetchFileUrl.then((fileUrl) => {

      // Update sourceDocuments with filename and fileUrl
      globalSourceDocs = globalSourceDocs.map((sourceDoc) => ({
        ...sourceDoc,
        metadata: {
          ...sourceDoc.metadata,
          source: filename, // Change source to filename
          fileUrl, // Add fileUrl to metadata
        },
      }));
      /*
      console.log('Updated sourceDocuments:', globalSourceDocs);
      */

      // Update the response with modified sourceDocuments
      const modifiedResponse = { ...response, sourceDocuments: globalSourceDocs };

      /*
      console.log('Modified response:', modifiedResponse);
      */

      res.status(200).json(modifiedResponse); // Send the modified response
    })
    .catch((error) => {
      console.error('Error:', error); // Print errors, if any
      res.status(500).json({ error: 'Something went wrong' }); // Return error response
    });
} else {
  // If no source documents found, return the original response
  console.log('No source documents found in the response.');
  res.status(200).json(response);
}

} catch (error: any) {
console.log('error', error);
res.status(500).json({ error: error.message || 'Something went wrong' });
}
}

Relevant log output

No response

Environment

Nodejs

Additional Context

No such

The text was updated successfully, but these errors were encountered:

MuhammadIshaq-AI added the bug Something isn't working label Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinecone vector db #178

Pinecone vector db #178

MuhammadIshaq-AI commented Jan 2, 2024

Pinecone vector db #178

Pinecone vector db #178

Comments

MuhammadIshaq-AI commented Jan 2, 2024

Is this a new bug?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context