Hi all,
I'm trying to optimize the following query on postgres 11.6 (running on Aurora)
select distinct
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from
t;
select distinct
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from
t;
From the explain plan (attached at the end of the email) I see that postgresql is doing several sorts one per window function and one
for the distinct that seems ok.
However all the window functions being on the same partition I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.
I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.
Is there something I am missing?
You can find below a script to set up a table and data to reproduce as well as the explain plan.
Setup Script
create table t(
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);
insert into t
(pk, c1, c2, c3, c4 )
select
generate_series::text pk,
'Grp' ||(generate_series / 4)::text c1,
generate_series::text c2,
generate_series::text c3,
generate_series::text c4
from generate_series(0, 1000000);
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);
insert into t
(pk, c1, c2, c3, c4 )
select
generate_series::text pk,
'Grp' ||(generate_series / 4)::text c1,
generate_series::text c2,
generate_series::text c3,
generate_series::text c4
from generate_series(0, 1000000);
Explain Plan
Unique (cost=808480.87..820980.88 rows=1000001 width=123) (actual time=7131.675..7781.082 rows=250001 loops=1)
-> Sort (cost=808480.87..810980.87 rows=1000001 width=123) (actual time=7131.673..7603.926 rows=1000001 loops=1)
Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER (?)), (first_value(c4) OVER (?))
Sort Method: external merge Disk: 59640kB
-> WindowAgg (cost=558937.90..578937.92 rows=1000001 width=123) (actual time=5179.374..6268.937 rows=1000001 loops=1)
-> Sort (cost=558937.90..561437.90 rows=1000001 width=91) (actual time=5179.355..5679.136 rows=1000001 loops=1)
Sort Key: c1, c4
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=336736.93..356736.95 rows=1000001 width=91) (actual time=3260.950..4389.116 rows=1000001 loops=1)
-> Sort (cost=336736.93..339236.93 rows=1000001 width=59) (actual time=3260.934..3778.385 rows=1000001 loops=1)
Sort Key: c1, c3
Sort Method: external merge Disk: 46176kB
-> WindowAgg (cost=141877.96..161877.98 rows=1000001 width=59) (actual time=1444.692..2477.284 rows=1000001 loops=1)
-> Sort (cost=141877.96..144377.96 rows=1000001 width=27) (actual time=1444.669..1906.993 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 39424kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=27) (actual time=0.011..177.815 rows=1000001 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms
-> Sort (cost=808480.87..810980.87 rows=1000001 width=123) (actual time=7131.673..7603.926 rows=1000001 loops=1)
Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER (?)), (first_value(c4) OVER (?))
Sort Method: external merge Disk: 59640kB
-> WindowAgg (cost=558937.90..578937.92 rows=1000001 width=123) (actual time=5179.374..6268.937 rows=1000001 loops=1)
-> Sort (cost=558937.90..561437.90 rows=1000001 width=91) (actual time=5179.355..5679.136 rows=1000001 loops=1)
Sort Key: c1, c4
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=336736.93..356736.95 rows=1000001 width=91) (actual time=3260.950..4389.116 rows=1000001 loops=1)
-> Sort (cost=336736.93..339236.93 rows=1000001 width=59) (actual time=3260.934..3778.385 rows=1000001 loops=1)
Sort Key: c1, c3
Sort Method: external merge Disk: 46176kB
-> WindowAgg (cost=141877.96..161877.98 rows=1000001 width=59) (actual time=1444.692..2477.284 rows=1000001 loops=1)
-> Sort (cost=141877.96..144377.96 rows=1000001 width=27) (actual time=1444.669..1906.993 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 39424kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=27) (actual time=0.011..177.815 rows=1000001 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms