Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: realtime aggregation select planning and execution time is poor #7015

Open
pohmelie opened this issue Jun 7, 2024 · 4 comments
Open

Comments

@pohmelie
Copy link

pohmelie commented Jun 7, 2024

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query executor, Query planner

What happened?

We faced problem that realtime aggregation select request is much heavier than request to the internal materialized view. In terms of execution it is more than 1000 times.

explain analyze select * from table_data_1m 
where timestamp >= '2024-01-01' 
order by timestamp asc limit 5;

explain analyze select * from _timescaledb_internal._materialized_hypertable_2 
where timestamp >= '2024-01-01'
order by timestamp asc limit 5;
Limit  (cost=1695.27..1695.33 rows=5 width=31) (actual time=259.585..259.590 rows=5 loops=1)
  ->  Merge Append  (cost=1695.27..1750.13 rows=4387 width=31) (actual time=259.584..259.587 rows=5 loops=1)
        Sort Key: _hyper_2_2_chunk."timestamp"
        ->  Sort  (cost=1693.69..1704.66 rows=4386 width=31) (actual time=259.512..259.514 rows=5 loops=1)
              Sort Key: _hyper_2_2_chunk."timestamp"
              Sort Method: top-N heapsort  Memory: 25kB
              ->  Append  (cost=0.00..1620.84 rows=4386 width=31) (actual time=0.016..179.104 rows=538387 loops=1)
                    ->  Seq Scan on _hyper_2_2_chunk  (cost=0.00..55.41 rows=2560 width=19) (actual time=0.015..0.448 rows=2561 loops=1)
                          Filter: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                    ->  Bitmap Heap Scan on _hyper_2_5_chunk  (cost=24.64..1543.50 rows=1826 width=48) (actual time=17.811..111.441 rows=535826 loops=1)
                          Recheck Cond: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                          Heap Blocks: exact=3413
                          ->  Bitmap Index Scan on _hyper_2_5_chunk__materialized_hypertable_2_timestamp_idx  (cost=0.00..24.18 rows=1826 width=0) (actual time=17.148..17.148 rows=535826 loops=1)
                                Index Cond: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
        ->  GroupAggregate  (cost=1.57..1.59 rows=1 width=19) (actual time=0.069..0.070 rows=0 loops=1)
              Group Key: (time_bucket('00:01:00'::interval, _hyper_1_4_chunk."timestamp")), _hyper_1_4_chunk.prefix
              ->  Sort  (cost=1.57..1.57 rows=1 width=27) (actual time=0.068..0.068 rows=0 loops=1)
                    Sort Key: (time_bucket('00:01:00'::interval, _hyper_1_4_chunk."timestamp")), _hyper_1_4_chunk.prefix
                    Sort Method: quicksort  Memory: 25kB
                    ->  Result  (cost=0.42..1.56 rows=1 width=27) (actual time=0.028..0.029 rows=0 loops=1)
                          ->  Index Scan using _hyper_1_4_chunk_table_data_timestamp_idx on _hyper_1_4_chunk  (cost=0.42..1.54 rows=1 width=19) (actual time=0.028..0.028 rows=0 loops=1)
                                Index Cond: (("timestamp" >= '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                                Filter: (time_bucket('00:01:00'::interval, "timestamp") >= '2024-01-01 00:00:00+04'::timestamp with time zone)
Planning Time: 2.372 ms
Execution Time: 259.737 ms
Limit  (cost=0.28..0.41 rows=5 width=47) (actual time=0.014..0.016 rows=5 loops=1)
  ->  Custom Scan (ChunkAppend) on _materialized_hypertable_2  (cost=0.28..68.80 rows=2561 width=47) (actual time=0.013..0.015 rows=5 loops=1)
        Order: _materialized_hypertable_2."timestamp"
        ->  Index Scan Backward using _hyper_2_2_chunk__materialized_hypertable_2_timestamp_idx on _hyper_2_2_chunk  (cost=0.28..68.80 rows=2561 width=19) (actual time=0.013..0.014 rows=5 loops=1)
              Index Cond: ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone)
        ->  Index Scan Backward using _hyper_2_5_chunk__materialized_hypertable_2_timestamp_idx on _hyper_2_5_chunk  (cost=0.42..6193.00 rows=121730 width=48) (never executed)
              Index Cond: ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone)
Planning Time: 0.305 ms
Execution Time: 0.051 ms

I read some other issues (#6976, #6321), but it looks like they solved and this one is irrelevant.

TimescaleDB version affected

2.15.1

PostgreSQL version used

16.3

What operating system did you use?

ubuntu 22.04

What installation method did you use?

Docker

What platform did you run on?

Not applicable

Relevant log output and stack trace

No response

How can we reproduce the bug?

create table table_data (
    prefix text not null,
    "timestamp" timestamp with time zone not null,
    value float,
    constraint unique_table_data_timestamp_prefix unique (timestamp, prefix)
);

select create_hypertable('table_data', by_range('timestamp'));

create materialized view table_data_1m
    with (timescaledb.continuous, timescaledb.materialized_only = false) as
    select
        prefix,
        time_bucket('1 minute', timestamp) as timestamp,
        last(value, "timestamp") as value
    from table_data
    group by time_bucket('1 minute', timestamp), prefix;
    
create index table_data_1m_prefix_timestamp
on table_data_1m (prefix, timestamp);

call refresh_continuous_aggregate('table_data_1m', null, null);

insert into table_data (prefix, timestamp, value)
select 
	left(md5(random()::text), 2),
	'2024-01-01'::timestamptz + make_interval(secs => i / 1000.0),
	random()
from generate_series(1, 10 * 60 * 1000) s(i);

call refresh_continuous_aggregate('table_data_1m', null, null);

insert into table_data (prefix, timestamp, value)
select 
	left(md5(random()::text), 2),
	now() + make_interval(secs => i),
	random()
from generate_series(1, 10 * 60 * 1000) s(i);

call refresh_continuous_aggregate('table_data_1m', '2024-06-01', null);

explain analyze select * from table_data_1m 
where timestamp >= '2024-01-01' 
order by timestamp asc limit 5;

explain analyze select * from _timescaledb_internal._materialized_hypertable_2 
where timestamp >= '2024-01-01'
order by timestamp asc limit 5;
@pohmelie pohmelie added the bug label Jun 7, 2024
@nikkhils
Copy link
Contributor

@pohmelie is your timescaledb version 2.5.1? That seems very old or did you want to mention 2.15.1?

@pohmelie
Copy link
Author

@nikkhils yes, my bad. It is 2.15.1.

@nikkhils
Copy link
Contributor

@pohmelie the two queries

explain analyze select * from table_data_1m 
where timestamp >= '2024-01-01' 
order by timestamp asc limit 5;

AND

explain analyze select * from _timescaledb_internal._materialized_hypertable_2 
where timestamp >= '2024-01-01'
order by timestamp asc limit 5;

are NOT the same. In the first query due to real time aggregation, the query that runs on the _timescaledb_internal._materialized_hypertable_2 table has a plan of:

->  Sort  (cost=1693.69..1704.66 rows=4386 width=31) (actual time=259.512..259.514 rows=5 loops=1)
              Sort Key: _hyper_2_2_chunk."timestamp"
              Sort Method: top-N heapsort  Memory: 25kB
              ->  Append  (cost=0.00..1620.84 rows=4386 width=31) (actual time=0.016..179.104 rows=538387 loops=1)
                    ->  Seq Scan on _hyper_2_2_chunk  (cost=0.00..55.41 rows=2560 width=19) (actual time=0.015..0.448 rows=2561 loops=1)
                          Filter: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                    ->  Bitmap Heap Scan on _hyper_2_5_chunk  (cost=24.64..1543.50 rows=1826 width=48) (actual time=17.811..111.441 rows=535826 loops=1)
                          Recheck Cond: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                          Heap Blocks: exact=3413
                          ->  Bitmap Index Scan on _hyper_2_5_chunk__materialized_hypertable_2_timestamp_idx  (cost=0.00..24.18 rows=1826 width=0) (actual time=17.148..17.148 rows=535826 loops=1)
                                Index Cond: (("timestamp" < '2024-06-14 21:35:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))

The expected rows and actual number of rows are out of sync.

(cost=24.64..1543.50 rows=1826 width=48) (actual time=17.811..111.441 rows=535826 loops=1)

Please run ANALYZE _timescaledb_internal._materialized_hypertable_2 to get the statistics up-to-date and then retry the query.

@pohmelie
Copy link
Author

Actually it is explain analyze in original script, so it will not change anything. Anyway.

create table table_data (
    prefix text not null,
    "timestamp" timestamp with time zone not null,
    value float,
    constraint unique_table_data_timestamp_prefix unique (timestamp, prefix)
);

select create_hypertable('table_data', by_range('timestamp'));

create materialized view table_data_1m
    with (timescaledb.continuous, timescaledb.materialized_only = false) as
    select
        prefix,
        time_bucket('1 minute', timestamp) as timestamp,
        last(value, "timestamp") as value
    from table_data
    group by time_bucket('1 minute', timestamp), prefix;
    
create index table_data_1m_prefix_timestamp
on table_data_1m (prefix, timestamp);

call refresh_continuous_aggregate('table_data_1m', null, null);

insert into table_data (prefix, timestamp, value)
select 
	left(md5(random()::text), 2),
	'2024-01-01'::timestamptz + make_interval(secs => i / 1000.0),
	random()
from generate_series(1, 10 * 60 * 1000) s(i);

call refresh_continuous_aggregate('table_data_1m', null, null);

insert into table_data (prefix, timestamp, value)
select 
	left(md5(random()::text), 2),
	now() + make_interval(secs => i),
	random()
from generate_series(1, 10 * 60 * 1000) s(i);

call refresh_continuous_aggregate('table_data_1m', '2024-06-01', null);

explain analyze select * from table_data_1m 
where timestamp >= '2024-01-01' 
order by timestamp asc limit 5;

ANALYZE _timescaledb_internal._materialized_hypertable_2;

explain analyze select * from _timescaledb_internal._materialized_hypertable_2 
where timestamp >= '2024-01-01'
order by timestamp asc limit 5;
Limit  (cost=1694.08..1694.14 rows=5 width=31) (actual time=239.235..239.240 rows=5 loops=1)
  ->  Merge Append  (cost=1694.08..1748.93 rows=4386 width=31) (actual time=239.233..239.236 rows=5 loops=1)
        Sort Key: _hyper_2_2_chunk."timestamp"
        ->  Sort  (cost=1692.63..1703.59 rows=4385 width=31) (actual time=239.179..239.181 rows=5 loops=1)
              Sort Key: _hyper_2_2_chunk."timestamp"
              Sort Method: top-N heapsort  Memory: 25kB
              ->  Append  (cost=0.00..1619.79 rows=4385 width=31) (actual time=0.008..163.374 rows=538155 loops=1)
                    ->  Seq Scan on _hyper_2_2_chunk  (cost=0.00..55.41 rows=2560 width=19) (actual time=0.007..0.556 rows=2561 loops=1)
                          Filter: (("timestamp" < '2024-06-28 01:26:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                    ->  Bitmap Heap Scan on _hyper_2_5_chunk  (cost=24.63..1542.45 rows=1825 width=48) (actual time=18.866..102.341 rows=535594 loops=1)
                          Recheck Cond: (("timestamp" < '2024-06-28 01:26:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                          Heap Blocks: exact=3412
                          ->  Bitmap Index Scan on _hyper_2_5_chunk__materialized_hypertable_2_timestamp_idx  (cost=0.00..24.17 rows=1825 width=0) (actual time=18.251..18.251 rows=535594 loops=1)
                                Index Cond: (("timestamp" < '2024-06-28 01:26:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
        ->  GroupAggregate  (cost=1.44..1.47 rows=1 width=19) (actual time=0.052..0.053 rows=0 loops=1)
              Group Key: (time_bucket('00:01:00'::interval, _hyper_1_4_chunk."timestamp")), _hyper_1_4_chunk.prefix
              ->  Sort  (cost=1.44..1.44 rows=1 width=27) (actual time=0.051..0.051 rows=0 loops=1)
                    Sort Key: (time_bucket('00:01:00'::interval, _hyper_1_4_chunk."timestamp")), _hyper_1_4_chunk.prefix
                    Sort Method: quicksort  Memory: 25kB
                    ->  Result  (cost=0.29..1.43 rows=1 width=27) (actual time=0.022..0.022 rows=0 loops=1)
                          ->  Index Scan using _hyper_1_4_chunk_table_data_timestamp_idx on _hyper_1_4_chunk  (cost=0.29..1.42 rows=1 width=19) (actual time=0.021..0.021 rows=0 loops=1)
                                Index Cond: (("timestamp" >= '2024-06-28 01:26:00+04'::timestamp with time zone) AND ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone))
                                Filter: (time_bucket('00:01:00'::interval, "timestamp") >= '2024-01-01 00:00:00+04'::timestamp with time zone)
Planning Time: 2.398 ms
Execution Time: 239.378 ms
Limit  (cost=0.28..0.41 rows=5 width=19) (actual time=0.017..0.020 rows=5 loops=1)
  ->  Custom Scan (ChunkAppend) on _materialized_hypertable_2  (cost=0.28..68.80 rows=2561 width=19) (actual time=0.016..0.019 rows=5 loops=1)
        Order: _materialized_hypertable_2."timestamp"
        ->  Index Scan Backward using _hyper_2_2_chunk__materialized_hypertable_2_timestamp_idx on _hyper_2_2_chunk  (cost=0.28..68.80 rows=2561 width=19) (actual time=0.016..0.017 rows=5 loops=1)
              Index Cond: ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone)
        ->  Index Scan Backward using _hyper_2_5_chunk__materialized_hypertable_2_timestamp_idx on _hyper_2_5_chunk  (cost=0.42..13709.42 rows=535594 width=19) (never executed)
              Index Cond: ("timestamp" >= '2024-01-01 00:00:00+04'::timestamp with time zone)
Planning Time: 0.966 ms
Execution Time: 0.048 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants