Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lightweight ingesting optional index #9036

Open
JaySon-Huang opened this issue May 11, 2024 · 0 comments
Open

Support lightweight ingesting optional index #9036

JaySon-Huang opened this issue May 11, 2024 · 0 comments
Assignees
Labels
type/enhancement Issue or PR for enhancement

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented May 11, 2024

Enhancement

In some scenarios, we need to create new indexes for existing data to speed up queries. Typically this can be done by triggering a major compaction (Segment DeltaMerge) that creates new index data during the major compaction. However, this involves rewriting all the columns, which makes indexing slow and has a large impact on system IO and CPU usage.

To mitigate the impact of index creation on the system, we can create new indexes only for the stable layer data in a more lightweight way. Because the stable layer contains 95% of the whole dataset, usually indexing the stable layer can bring sufficient performance boosting.

The overall procedure:

  1. Prepare - create a background task that reads only the columns relevant to index generation, then store the generated the index file to persisted storage.
  2. Ingest - ingest the index file into stable's DMFile. In a DMFile that uses meta v2, this only requires modifying the information in the ExtendColumnStat block in meta v2. DMFile::ingestIndex creates a new meta v2 file and atomically replaces the old meta v2 file.
  3. Apply - update the epoch value of the StableValueSpace in the Segment. Under store-compute separation, the ComputeNode needs to decide whether it needs to clean up the old DMFile meta file cache and re-download the file from S3 based on this value.

This ingesting index mechanism can help us handle indexes that are needed for Vector Search (#9032) / Full Text search, or other optional indexes.

@JaySon-Huang JaySon-Huang added the type/enhancement Issue or PR for enhancement label May 11, 2024
@JaySon-Huang JaySon-Huang mentioned this issue May 28, 2024
12 tasks
ti-chi-bot bot pushed a commit that referenced this issue May 28, 2024
ref #6233, ref #9036

Signed-off-by: Lloyd-Pottiger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Issue or PR for enhancement
Projects
None yet
Development

No branches or pull requests

2 participants