Dear pgsql-general, I am currently working in
PostgreSQL 13.7, compiled by Visual C++
build 1914, 64-bit and encountered the following: I have a table with two columns, which is created as
follows: CREATE TABLE IF NOT EXISTS bhload_nohyp_noin ( t BIGINT NOT NULL, v REAL NULL ); The table is 32 million rows long.
When I run the following query:
SELECT t/(1000*3600*24) as time, avg(v) FROM bhload_nohyp_noin GROUP BY time ORDER BY time Postgres executes the following plan:
"QUERY PLAN" "Finalize GroupAggregate (cost=295097.60..295226.53 rows=200 width=16)" " Output: ((t / 86400000)), avg(v)" " Group Key: ((bhload_nohyp_noin.t / 86400000))" " -> Gather Merge (cost=295097.60..295218.53 rows=1000 width=40)" " Output: ((t / 86400000)), (PARTIAL avg(v))" " Workers Planned: 5" " -> Sort (cost=295097.52..295098.02 rows=200 width=40)" " Output: ((t / 86400000)), (PARTIAL avg(v))" " Sort Key: ((bhload_nohyp_noin.t / 86400000))" " -> Partial HashAggregate (cost=295087.38..295089.88 rows=200 width=40)" " Output: ((t / 86400000)), PARTIAL avg(v)" " Group Key: (bhload_nohyp_noin.t / 86400000)" " -> Parallel Seq Scan on public.bhload_nohyp_noin (cost=0.00..259966.13 rows=7024250 width=12)" " Output: (t / 86400000), v" "Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', max_parallel_workers = '40',
max_parallel_workers_per_gather = '10', parallel_setup_cost = '0', random_page_cost = '1.1',
temp_buffers = '200MB', work_mem = '1GB'" "Planning:" " Buffers: shared hit=14" "Planning Time: 0.747 ms" This plan takes about 2 seconds.
However, when postgres learns the table statistics, either after running the
query once, performing vacuum analyse on the table or assigning a unique index, postgres refuses to perform partial aggregates. The reason is that the column
t is unique and ordered which causes the query plan to change.
This results in the following query plan: "QUERY PLAN" "GroupAggregate (cost=971690.95..5460155.25 rows=31850064 width=16)" " Output: ((t / 86400000)), avg(v)" " Group Key: ((bhload_nohyp_noin.t / 86400000))" " -> Gather Merge (cost=971690.95..4823153.97 rows=31850064 width=12)" " Output: ((t / 86400000)), v" " Workers Planned: 5" " -> Sort (cost=971690.88..987615.91 rows=6370013 width=12)" " Output: ((t / 86400000)), v" " Sort Key: ((bhload_nohyp_noin.t / 86400000))" " -> Parallel Seq Scan on public.bhload_nohyp_noin (cost=0.00..251788.16 rows=6370013 width=12)" " Output: (t / 86400000), v" "Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', max_parallel_workers = '40',
max_parallel_workers_per_gather = '10', parallel_setup_cost = '0', random_page_cost = '1.1',
temp_buffers = '200MB', work_mem = '1GB'" "Planning:" " Buffers: shared hit=6" "Planning Time: 0.180 ms" This takes about 18 seconds, which is much slower. When swapping the columns t
and v, so grouping by v, postgres decides to do partial aggregation again
which makes the operation 2 seconds.
Some extra notes: - I noticed that the parallel and partial hash aggregation gives the speed up. - I tried to trigger it by using the parallel costs, but I could not trigger it. - The table statistics before vacuum analyze is empty. - Here are the table statistics after vacuum analyze: schemaname tablename attname inherited null_frac avg_width n_distinct correlation most_common_elem_freqs elem_count_histogram public bhload_nohyp_noin t FALSE 0 8 -1 1 NULL NULL public bhload_nohyp_noin v FALSE 0 4 3937 -0.20693 NULL NULL I expected the query to become faster
with a unique index or column, so why does the query planner decide on group
aggregation instead of partial aggregation? Regards, Fred Bos ---------------------------------------------------------------------------
|