Okay,
I have some converted code that uses this syntax.
For 20 Million rows it was taking 15-20 minutes! (versus 3 minutes) on live data.
See here: https://explain.depesz.com/s/VQFJ [There are 2 optimizations, removing the ORDER BY NULL, and just using a sequence]
(The above is a live data run) The IO with the ORDER BY NULL is crazy high. 10x the READ in GB!
The solution is to remove the ORDER BY NULL. [since that is not sortable, should it be ignored?]
This does NOT SHOW UP with 1 million rows.
simple example (slow):
explain (verbose, buffers, analyze) select row_number() over(ORDER BY NULL) as rn, x, random() from generate_series(1,20000000) x order by x desc, random();
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3365975.67..3415975.67 rows=20000000 width=52) (actual time=816050.580..817984.447 rows=20000000 loops=1)
Output: (row_number() OVER (?)), x, (random()), NULL::text
Sort Key: x.x DESC, (random())
Sort Method: external merge Disk: 665376kB
Buffers: temp read=20136158 written=161301
-> WindowAgg (cost=0.00..550000.00 rows=20000000 width=52) (actual time=12585.463..791440.272 rows=20000000 loops=1)
Output: row_number() OVER (?), x, random(), (NULL::text)
Buffers: temp read=20052986 written=78126
-> Function Scan on pg_catalog.generate_series x (cost=0.00..200000.00 rows=20000000 width=36) (actual time=3282.882..6343.470 rows=20000000 loops=1)
Output: NULL::text, x
Function Call: generate_series(1, 20000000)
Buffers: temp read=34180 written=34180
Query Identifier: 7415410443092007025
Planning Time: 0.036 ms
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.231 ms, Inlining 5.021 ms, Optimization 18.850 ms, Emission 12.934 ms, Total 37.036 ms
Execution Time: 819003.415 ms
(19 rows)
Time: 819004.041 ms (13:39.004)
explain (verbose, buffers, analyze) select row_number() over(ORDER BY NULL) as rn, x, random() from generate_series(1,20000000) x order by x desc, random();
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3365975.67..3415975.67 rows=20000000 width=52) (actual time=816050.580..817984.447 rows=20000000 loops=1)
Output: (row_number() OVER (?)), x, (random()), NULL::text
Sort Key: x.x DESC, (random())
Sort Method: external merge Disk: 665376kB
Buffers: temp read=20136158 written=161301
-> WindowAgg (cost=0.00..550000.00 rows=20000000 width=52) (actual time=12585.463..791440.272 rows=20000000 loops=1)
Output: row_number() OVER (?), x, random(), (NULL::text)
Buffers: temp read=20052986 written=78126
-> Function Scan on pg_catalog.generate_series x (cost=0.00..200000.00 rows=20000000 width=36) (actual time=3282.882..6343.470 rows=20000000 loops=1)
Output: NULL::text, x
Function Call: generate_series(1, 20000000)
Buffers: temp read=34180 written=34180
Query Identifier: 7415410443092007025
Planning Time: 0.036 ms
JIT:
Functions: 7
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.231 ms, Inlining 5.021 ms, Optimization 18.850 ms, Emission 12.934 ms, Total 37.036 ms
Execution Time: 819003.415 ms
(19 rows)
Time: 819004.041 ms (13:39.004)
the fixed version:
explain (verbose, buffers, analyze) select row_number() over() as rn, x, random() from generate_series(1,20000000) x order by x desc, random();
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3159725.67..3209725.67 rows=20000000 width=20) (actual time=23361.306..25466.728 rows=20000000 loops=1)
Output: (row_number() OVER (?)), x, (random())
Sort Key: x.x DESC, (random())
Sort Method: external merge Disk: 665376kB
Buffers: temp read=117352 written=117355
-> WindowAgg (cost=0.00..500000.00 rows=20000000 width=20) (actual time=3409.882..9789.217 rows=20000000 loops=1)
Output: row_number() OVER (?), x, random()
Buffers: temp read=34180 written=34180
-> Function Scan on pg_catalog.generate_series x (cost=0.00..200000.00 rows=20000000 width=4) (actual time=3409.853..6768.737 rows=20000000 loops=1)
Output: x
Function Call: generate_series(1, 20000000)
Buffers: temp read=34180 written=34180
Query Identifier: -8739706385719272689
Planning Time: 0.067 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.180 ms, Inlining 32.829 ms, Optimization 5.890 ms, Emission 3.899 ms, Total 42.798 ms
Execution Time: 26070.514 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3159725.67..3209725.67 rows=20000000 width=20) (actual time=23361.306..25466.728 rows=20000000 loops=1)
Output: (row_number() OVER (?)), x, (random())
Sort Key: x.x DESC, (random())
Sort Method: external merge Disk: 665376kB
Buffers: temp read=117352 written=117355
-> WindowAgg (cost=0.00..500000.00 rows=20000000 width=20) (actual time=3409.882..9789.217 rows=20000000 loops=1)
Output: row_number() OVER (?), x, random()
Buffers: temp read=34180 written=34180
-> Function Scan on pg_catalog.generate_series x (cost=0.00..200000.00 rows=20000000 width=4) (actual time=3409.853..6768.737 rows=20000000 loops=1)
Output: x
Function Call: generate_series(1, 20000000)
Buffers: temp read=34180 written=34180
Query Identifier: -8739706385719272689
Planning Time: 0.067 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.180 ms, Inlining 32.829 ms, Optimization 5.890 ms, Emission 3.899 ms, Total 42.798 ms
Execution Time: 26070.514 ms
Time: 26071.112 ms (00:26.071)