Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Utility Functions for Pinecone 1.0.1 #117

Open
2 tasks done
glody007 opened this issue Sep 14, 2023 · 2 comments
Open
2 tasks done

[Feature] Utility Functions for Pinecone 1.0.1 #117

glody007 opened this issue Sep 14, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@glody007
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing functionality

Describe the feature

I'd like to address a specific concern regarding our project's compatibility with Pinecone 1.0. In the latest Pinecone release, functions such as chunkedUpsert and createIndexIfNotExists have been deprecated but are retained for retro-compatibility.

My question is simple: Are there plans to implement updated utility functions to replace these deprecated ones within our project, or is it expected that users will need to implement these changes independently to ensure Pinecone 1.0 compatibility?

Your insights on this matter would be greatly appreciated.

Thank you,

Describe alternatives you've considered

These functions have been deprecated in Pinecone 1.0, which prompted me to take the initiative to reimplement them as utilities that are compatible with Pinecone 1.0.

Who will this benefit?

It's worth noting that implementing updated utility functions to replace these deprecated ones could have significant benefits. Not only are these utilities very simple to use, but they also have the potential to expedite the development process for all users, enhancing productivity and ensuring a smooth transition to Pinecone 1.0.

Are you interested in contributing this feature?

I'm more than willing to contribute to the integration process and help with any necessary modifications.

Anything else?

No response

@glody007 glody007 added the enhancement New feature or request label Sep 14, 2023
@jhamon
Copy link
Collaborator

jhamon commented Sep 14, 2023

Thanks for the question. We're definitely interested in making the experience as smooth as possible.

My general feeling about the legacy utils is that although they are useful, especially when repeatedly doing those actions in integration testing (which is what they were originally used for), the fact they were/are needed reflects shortcomings in the client itself. I think ideally you should have the client do whatever it is you need without resorting to additional utility wrappers.

waitUntilIndexIsReady

This utility handled polling for status to see if the index is ready to handle data operations. This should be covered with the newly added waitUntilReady boolean option accepted by createIndex.

import { Pinecone } from '@pinecone-database/pinecone'

const pinecone = new Pinecone()
await pinecone.createIndex({ 
  name: 'my-index', 
  dimension: 1586, 
  waitUntilReady: true  // yay, this exists today
})

createIndexIfNotExists

The legacy utility implementation makes 2 API calls:

  • Calls to get the list of indexes
  • Calls to create the index if missing from list

I think a better way to handle this with a single call would be to expose an option in createIndex to suppress the 409 conflict error. I.e. just try to create but don't throw if the result of the call is 409 Conflict. So something along the lines of

import { Pinecone } from '@pinecone-database/pinecone'

const pinecone = new Pinecone()
await pinecone.createClient({ 
  name: 'index-name', 
  dimension: 1586, 
  suppressConflicts: true // proposed option, open to ideas for better names
})

I don't have a specific timeframe in mind for this, but it's something I plan to do at some point. If you wanted to PR this change, I'd be happy to review it.

chunkedUpsert

Where the legacy utility falls down is when things go wrong with one of the upserts. If it fails, what do you do? You end up running it again and re-upserting everything, which can be a huge waste of time for large datasets. This wasn't a concern in the narrow integration testing and sample app scenarios this util was made for, but in serious production work we need to have a better answer than "retry everything again".

There was a point during the pre-release development when we had a batchSize option you could pass to upsert to get this behavior handled in basically the same way as the legacy util. I ended up ripping it back out before the release after testing in one of our sample apps showed a need to think through error handling a lot more. It seemed better to launch without this feature than to ship something that would cause a lot of headaches down the road.

Let's say you have a scenario like this:

import { Pinecone } from '@pinecone-database/pinecone'

const pinecone = new Pinecone()
const index = pinecone.index('my-index')

const records = [
  // you use an embedding model to make, let's say 100k vectors
]

await index.upsert({ 
  records,
  batchSize: 100 // not available in the current release
})

In that case, upsert needs to make 1000 API (100k records / 100 batchSize) calls on your behalf. If a handful of these many calls fail, what should it do? How can the method expose to the caller what the failures were? Should it just automatically retry with exponential backoff? How many retries? Timeouts?

I do think implementing it in the upsert itself makes sense, but thinking about what to do when things go wrong is not a simple matter. I'm open to proposals on how we can do this in a way that is robust and flexible enough to cover a variety of scenarios.

@glody007
Copy link
Contributor Author

Thank you for your thorough response and insights. It's clear that you're committed to making the integration smoother and more efficient.

I appreciate the explanation, and your approach to addressing their shortcomings makes sense, your insights and input are invaluable.

1. Suppress Conflicts Option:
I will create a pull request for this change soon.

2. Upsert Error Handling and Retry Strategies (Future Discussion):
Additionally, I plan to delve into the challenge of enhancing error handling and retry strategies for the upsert function. Handling failures during data upserts, especially with large datasets, is a critical aspect. While this is a more complex task that requires careful consideration, I am committed to exploring robust solutions and will return to the team with ideas and proposals in the near future.

Thank you once again for your responsiveness and dedication to improving Pinecone.

@glody007 glody007 changed the title [Feature] Utility Functions for Pinecone 1.0 [Feature] Utility Functions for Pinecone 1.0.1 Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants