Hello list, I'm getting an inefficient query plan for a SELECT ... EXCEPT ... query, where the left side is a very short table (even zero-length sometimes, but also also rarely can be as long as 200K rows), and the right side is a table with 10M UNIQUE NOT NULL rows:
\d test_datatags
Table "public.test_datatags" Column | Type | Collation | Nullable | Default ----------------+---------+-----------+----------+---------------------------------- test_datatag_n | integer | | not null | generated by default as identity test_datatag | text | | not null | Indexes: "test_datatags_pkey" PRIMARY KEY, btree (test_datatag_n) "test_datatags_test_datatag_key" UNIQUE CONSTRAINT, btree (test_datatag) Follows a simplified and pathological case, that takes 1.7s even though the left side is empty:
BEGIN; CREATE TEMPORARY TABLE tmp_table(d1 TEXT) ON COMMIT DROP; ANALYZE VERBOSE tmp_table ;
INFO: analyzing "pg_temp_9.tmp_table" INFO: "tmp_table": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
EXPLAIN (ANALYSE,VERBOSE,BUFFERS,SETTINGS)
SELECT DISTINCT d1 FROM tmp_table WHERE d1 IS NOT NULL EXCEPT SELECT test_datatag FROM test_datatags; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- HashSetOp Except (cost=0.00..299854.08 rows=1 width=36) (actual time=1726.470..1726.472 rows=0 loops=1) Output: "*SELECT* 1".d1, (0) Buffers: shared hit=1259 read=58800 I/O Timings: shared/local read=77.713 -> Append (cost=0.00..278054.53 rows=8719821 width=36) (actual time=3.754..1287.901 rows=8702840 loops=1) Buffers: shared hit=1259 read=58800 I/O Timings: shared/local read=77.713 -> Subquery Scan on "*SELECT* 1" (cost=0.00..0.02 rows=1 width=36) (actual time=0.003..0.003 rows=0 loops=1) Output: "*SELECT* 1".d1, 0 -> HashAggregate (cost=0.00..0.01 rows=1 width=32) (actual time=0.002..0.003 rows=0 loops=1) Output: tmp_table.d1 Group Key: tmp_table.d1 Batches: 1 Memory Usage: 24kB -> Seq Scan on pg_temp.tmp_table (cost=0.00..0.00 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=1) Output: tmp_table.d1 Filter: (tmp_table.d1 IS NOT NULL) -> Subquery Scan on "*SELECT* 2" (cost=0.00..234455.40 rows=8719820 width=26) (actual time=3.751..943.850 rows=8702840 loops=1) Output: "*SELECT* 2".test_datatag, 1 Buffers: shared hit=1259 read=58800 I/O Timings: shared/local read=77.713 -> Seq Scan on public.test_datatags (cost=0.00..147257.20 rows=8719820 width=22) (actual time=3.747..515.420 rows=8702840 loops=1) Output: test_datatags.test_datatag Buffers: shared hit=1259 read=58800 I/O Timings: shared/local read=77.713 Settings: effective_io_concurrency = '0', enable_partitionwise_aggregate = 'on', enable_partitionwise_join = 'on', hash_mem_multiplier = '1', maintenance_io_concurrency = '0', max_parallel_workers = '12', max_parallel_workers_per_gather = '0', temp_buffers = '64MB' Planning: Buffers: shared hit=2 Planning Time: 0.055 ms JIT: Functions: 15 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.317 ms, Inlining 0.000 ms, Optimization 0.179 ms, Emission 3.542 ms, Total 4.037 ms Execution Time: 1726.835 ms (33 rows) I'm wondering why the planner doesn't see that the left table is very small and follow a different path.
From an abstract computer science POV, I would
1. sort the left table (the right one is already indexed) 2. "merge" the two tables, by walking them in-order in parallel and excluding the matches 3. stop when the left table is exhausted, which would happen very early. Is this worth a bug report? I can file one if the issue is not known. Or am I misunderstanding the implications of the SELECT-EXCEPT query? In the meantime I have replaced the query with a LEFT OUTER JOIN which performs much better, and I believe is equivalent. I find it less readable than the query in question though. Plus, I have a bunch of SELECT-EXCEPT queries (with smaller right-side tables) in my application that I would hate to change them all to the ugliest equivalent. Under what conditions would the above query plan perform well? Thanks in advance, Dimitris