Michael, David thanks for your quick replies.
@Michael
first_value(c2) OVER (PARTITION BY c1),
first_value(c2) OVER (PARTITION BY c1 order by c2)
I initially dismissed writing this query using joins or subselects because the real query has about 80 columns and I was afraid that having 80 joins/subselect would cause issues with postgresql including planner that would fallback to GEQO.
I'll test it anyway with real data and see how it behaves.
@David About Subsequent sorts
From my experiments the time of subsequent sorts is not reduced by previous sort but is in fact higher.
In the following example in Query 1
Sort node on key
c1, c2 has a time of ~1893 excluding child nodes
In the following example in Query 2
- Sort node on key c1 has a time of ~1558 excluding child nodes
- Sort node on key c1, c2 has
a time of ~2964 excluding child nodes which is higher than the time for the same sort key in Query 1 but in Query 2 data postgresql could have benefited from a preliminary sort on c1.
I can't figure why it would be slower.
Query1
explain analyze select
first_value(c2) OVER (PARTITION BY c1 order by c2)
from t;
first_value(c2) OVER (PARTITION BY c1 order by c2)
from t;
Explain Plan
Query1
WindowAgg (cost=135042.46..155042.48 rows=1000001 width=47) (actual time=1555.005..2639.659 rows=1000001 loops=1)
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1554.989..2070.462 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.013..177.037 rows=1000001 loops=1)
Planning Time: 0.115 ms
Execution Time: 2693.935 ms
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1554.989..2070.462 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.013..177.037 rows=1000001 loops=1)
Planning Time: 0.115 ms
Execution Time: 2693.935 ms
Query2
explain analyze select
first_value(c2) OVER (PARTITION BY c1),
first_value(c2) OVER (PARTITION BY c1 order by c2)
from t;
Explain Plan
Query2
WindowAgg (cost=313730.43..333730.45 rows=1000001 width=79) (actual time=4692.257..5702.226 rows=1000001 loops=1)
-> Sort (cost=313730.43..316230.43 rows=1000001 width=47) (actual time=4692.152..5174.928 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 32688kB
-> WindowAgg (cost=135042.46..152542.48 rows=1000001 width=47) (actual time=1188.109..2210.845 rows=1000001 loops=1)
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1188.099..1709.253 rows=1000001 loops=1)
Sort Key: c1
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.014..150.435 rows=1000001 loops=1)
Planning Time: 0.059 ms
Execution Time: 5756.591 ms
-> Sort (cost=313730.43..316230.43 rows=1000001 width=47) (actual time=4692.152..5174.928 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 32688kB
-> WindowAgg (cost=135042.46..152542.48 rows=1000001 width=47) (actual time=1188.109..2210.845 rows=1000001 loops=1)
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1188.099..1709.253 rows=1000001 loops=1)
Sort Key: c1
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.014..150.435 rows=1000001 loops=1)
Planning Time: 0.059 ms
Execution Time: 5756.591 ms
@David About using an index
You are correct that a simple index cannot be leveraged efficiently.
However a covering index including all columns and starting with the partition column c1 could be used to avoid sorting on c1 right?
But in my tests it is not used. In the following example
covering_index could be used to avoid filtering on c1 but it doesn't:
Covering Index
create index covering_index on t (c1,c2, c3,c4);
Query
explain analyze select
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;
Plan
WindowAgg (cost=417853.93..440353.95 rows=1000001 width=123) (actual time=3071.980..4184.030 rows=1000001 loops=1)
-> Sort (cost=417853.93..420353.93 rows=1000001 width=91) (actual time=3071.962..3575.641 rows=1000001 loops=1)
Sort Key: c1, (COALESCE(c4, '000'::character varying)), c3
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=193152.96..215652.98 rows=1000001 width=91) (actual time=1268.373..2271.463 rows=1000001 loops=1)
-> Sort (cost=193152.96..195652.96 rows=1000001 width=59) (actual time=1268.357..1711.526 rows=1000001 loops=1)
Sort Key: c1, c2, c4
Sort Method: external merge Disk: 46168kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=59) (actual time=0.014..163.347 rows=1000001 loops=1)
Planning Time: 0.081 ms
Execution Time: 4250.367 ms
WindowAgg (cost=417853.93..440353.95 rows=1000001 width=123) (actual time=3071.980..4184.030 rows=1000001 loops=1)
-> Sort (cost=417853.93..420353.93 rows=1000001 width=91) (actual time=3071.962..3575.641 rows=1000001 loops=1)
Sort Key: c1, (COALESCE(c4, '000'::character varying)), c3
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=193152.96..215652.98 rows=1000001 width=91) (actual time=1268.373..2271.463 rows=1000001 loops=1)
-> Sort (cost=193152.96..195652.96 rows=1000001 width=59) (actual time=1268.357..1711.526 rows=1000001 loops=1)
Sort Key: c1, c2, c4
Sort Method: external merge Disk: 46168kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=59) (actual time=0.014..163.347 rows=1000001 loops=1)
Planning Time: 0.081 ms
Execution Time: 4250.367 ms
However the index is used if one of the partition by + order by matches entirely the beginning of the index. For instance query:
select
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;
Uses covering_index to avoid the sort on c1, c2
On Tue, Jul 7, 2020 at 12:48 AM David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:
On Monday, July 6, 2020, Sebastien Arod <sebastien.arod@xxxxxxxxx> wrote:I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.The plan shown does share - the output of one sort goes into another. Subsequent sorts still have to happen but they should be faster as the first field is already grouped. Doesn’t change the plan though.I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.Use it how? You are still evaluating 250k groups so doing anything piece-wise seems like an expected loss compared to sequential scan.David J.