You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some scenarios, we need to create new indexes for existing data to speed up queries. Typically this can be done by triggering a major compaction (Segment DeltaMerge) that creates new index data during the major compaction. However, this involves rewriting all the columns, which makes indexing slow and has a large impact on system IO and CPU usage.
To mitigate the impact of index creation on the system, we can create new indexes only for the stable layer data in a more lightweight way. Because the stable layer contains 95% of the whole dataset, usually indexing the stable layer can bring sufficient performance boosting.
The overall procedure:
Prepare - create a background task that reads only the columns relevant to index generation, then store the generated the index file to persisted storage.
Ingest - ingest the index file into stable's DMFile. In a DMFile that uses meta v2, this only requires modifying the information in the ExtendColumnStat block in meta v2. DMFile::ingestIndex creates a new meta v2 file and atomically replaces the old meta v2 file.
Apply - update the epoch value of the StableValueSpace in the Segment. Under store-compute separation, the ComputeNode needs to decide whether it needs to clean up the old DMFile meta file cache and re-download the file from S3 based on this value.
This ingesting index mechanism can help us handle indexes that are needed for Vector Search (#9032) / Full Text search, or other optional indexes.
The text was updated successfully, but these errors were encountered:
Enhancement
In some scenarios, we need to create new indexes for existing data to speed up queries. Typically this can be done by triggering a major compaction (Segment DeltaMerge) that creates new index data during the major compaction. However, this involves rewriting all the columns, which makes indexing slow and has a large impact on system IO and CPU usage.
To mitigate the impact of index creation on the system, we can create new indexes only for the stable layer data in a more lightweight way. Because the stable layer contains 95% of the whole dataset, usually indexing the stable layer can bring sufficient performance boosting.
The overall procedure:
This ingesting index mechanism can help us handle indexes that are needed for Vector Search (#9032) / Full Text search, or other optional indexes.
The text was updated successfully, but these errors were encountered: