Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update sort algorithm using loser tree for multi sort merge #15869

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

forsaken628
Copy link
Contributor

@forsaken628 forsaken628 commented Jun 23, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Fixes #11604

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jun 23, 2024
@sundy-li
Copy link
Member

Is there any performance comparison data of this pr?

Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
@forsaken628
Copy link
Contributor Author

Is there any performance comparison data of this pr?

Not yet, do I need to add an algorithm level benchmark, or just ci-benchmark?

@sundy-li
Copy link
Member

you can pref it in local, eg:

create table t(a int, b string, d float, e date);
create table t_random like t engine = random;

## generate data
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
...

## sort perf 

select * from t order by a,b ignore_result;
...

Signed-off-by: coldWater <[email protected]>
@forsaken628
Copy link
Contributor Author

you can pref it in local, eg:

create table t(a int, b string, d float, e date);
create table t_random like t engine = random;

## generate data
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
insert into t select * from t_random limit 5000000;
...

## sort perf 

select * from t order by a,b ignore_result;
...

No noticeable changes. There seems to be a slight improvement from flamegraph, not obvious. It could also be due to the distribution of the data.

flamegraph

data set

read rows: 25000000
read size: 202.54 MiB
partitions total: 24
partitions scanned: 24
pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 24 to 24>]
push downs: [filters: [], limit: NONE]
estimated rows: 25000000.00

sql

sort

select * from t order by a,b ignore_result;
pprof.cpu.heap.sort.pb.gz
pprof.cpu.loser.sort.pb.gz

window

SELECT MAX(d) OVER (PARTITION BY a) FROM t ignore_result;
pprof.cpu.heap.window.pb.gz
pprof.cpu.loser.window.pb.gz

@sundy-li
Copy link
Member

The codes LGTM. Some comments about this pr:

Better add a new setting named enable_loser_tree_merge_sort, default to 1.

Then we can create the merger in MultiSortMergeProcessor by setting in runtime. If any bug happens, we can switch to use different impl.

Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Signed-off-by: coldWater <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Update sort algorithm using Loser Tree
2 participants