On Tue, Sep 1, 2020 at 1:22 AM Kyotaro Horiguchi <horikyota.ntt@xxxxxxxxx> wrote:
FWIW, the attached is the dusted-off version of a part of a stalled
development of mine, which unconditionally(!) creates on-the-fly
statistics on VALUES list. It seems to work for certain cases,
although the planning time increases significantly.
=$ CREATE TABLE t1 AS SELECT a, a * 2 AS b FROM generate_series(0, 99999) a;
=$ CREATE INDEX ON t1 (a);
> perl q.pl(*) | psql
*: q.pl:
> print "explain analyze select b from t1 join (values ";
> foreach $i (0..10000) {
> print ", " if ($i > 0);
> printf("(%d)", $i/10 + 1000);
> }
> print ") as v(v) on (v.v = t1.a);";
patched:
Merge Join (cost=824.25..1005.19 rows=10001 width=4) (actual time=13.513..24.285 rows=10001 loops=1)
Merge Cond: (t1.a = "*VALUES*".column1)
-> Index Scan using t1_a_idx on t1 (cost=0.29..3050.29 rows=100000 width=8) (actual time=0.033..1.629 rows=2002 loops=1)
-> Sort (cost=789.47..814.47 rows=10001 width=4) (actual time=12.557..14.546 rows=10001 loops=1)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 931kB
-> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.271 rows=10001 loops=1)
Planning Time: 17.290 ms
Execution Time: 26.344 ms
(9 rows)
master:
Hash Join (cost=250.03..2168.03 rows=10001 width=4) (actual time=14.482..77.205 rows=10001 loops=1)
Hash Cond: (t1.a = "*VALUES*".column1)
-> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.017..23.540 rows=100000 loops=1)
-> Hash (cost=125.01..125.01 rows=10001 width=4) (actual time=13.786..13.788 rows=10001 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
-> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 width=4) (actual time=0.002..8.503 rows=10001 loops=1)
Planning Time: 12.365 ms
Execution Time: 78.567 ms
(8 rows)
We end up abusing the option of creating temp tables and analyzing them to get around the pain of queries going off the rails because of bad stats or lack of stats on values. I believe most/all of the core team and perhaps most contributors are against query hints in general (with some very good reasons) but it might be amazing to have the option to incur the planning time cost in some cases at least.
For my case, I think the changes coming in PG v13 or maybe v14 for joins being helped by multivariate statistics will mitigate the pain point.