Hello list, I have a very simple NATURAL JOIN that does not fit in the work_mem. Why does the query planner prefer a hash join that needs 361s, while with a sort operation and a merge join it takes only 13s? The server is an old Mac Mini with hard disk drive and only 4GB RAM. Postgres version info: PostgreSQL 15.0 on x86_64-apple-darwin20.6.0, compiled by Apple clang version 12.0.0 (clang-1200.0.32.29), 64-bit The low work_mem and the disabled memoization are set on purpose, in order to simplify a complex query, while reproducing the same problem that I experienced there. This result is the simplest query I could get, where the optimizer does not go for a faster merge join.
From my point of view a merge join is clearly faster, because the hash
table does not fit in memory and I expect a hash join to do a lot of random I/O. But the query planner does not see that, and increasing random_page_cost does not help either. In fact the opposite happens: the merge join gets a higher cost difference to the hash join, as I increase the random page cost! # EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM tasks_mm_workitems NATURAL JOIN workitem_ids; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=121222.68..257633.01 rows=3702994 width=241) (actual time=184498.464..360606.257 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n, workitem_ids.workitem_id Inner Unique: true Hash Cond: (tasks_mm_workitems.workitem_n = workitem_ids.workitem_n) Buffers: shared hit=15068 read=47434, temp read=56309 written=56309 -> Seq Scan on public.tasks_mm_workitems (cost=0.00..53488.94 rows=3702994 width=8) (actual time=0.040..1376.084 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n Buffers: shared read=16459 -> Hash (cost=59780.19..59780.19 rows=1373719 width=237) (actual time=184361.874..184361.875 rows=1373737 loops=1) Output: workitem_ids.workitem_id, workitem_ids.workitem_n Buckets: 4096 Batches: 512 Memory Usage: 759kB Buffers: shared hit=15068 read=30975, temp written=43092 -> Seq Scan on public.workitem_ids (cost=0.00..59780.19 rows=1373719 width=237) (actual time=0.026..1912.312 rows=1373737 loops=1) Output: workitem_ids.workitem_id, workitem_ids.workitem_n Buffers: shared hit=15068 read=30975 Settings: effective_cache_size = '500MB', enable_memoize = 'off', hash_mem_multiplier = '1', max_parallel_workers_per_gather = '1', work_mem = '1MB' Planning: Buffers: shared hit=2 read=6 Planning Time: 0.568 ms Execution Time: 361106.876 ms (20 rows) # EXPLAIN (ANALYZE,VERBOSE,BUFFERS,SETTINGS) SELECT * FROM tasks_mm_workitems NATURAL JOIN workitem_ids; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=609453.49..759407.78 rows=3702994 width=241) (actual time=5062.513..10866.313 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n, workitem_ids.workitem_id Merge Cond: (workitem_ids.workitem_n = tasks_mm_workitems.workitem_n) Buffers: shared hit=5343 read=66053, temp read=32621 written=32894 -> Index Scan using workitem_ids_pkey on public.workitem_ids (cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 rows=1373737 loops=1) Output: workitem_ids.workitem_n, workitem_ids.workitem_id Buffers: shared hit=5310 read=49627 -> Materialize (cost=609372.91..627887.88 rows=3702994 width=8) (actual time=5062.389..7392.640 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n Buffers: shared hit=33 read=16426, temp read=32621 written=32894 -> Sort (cost=609372.91..618630.40 rows=3702994 width=8) (actual time=5062.378..6068.703 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n Sort Key: tasks_mm_workitems.workitem_n Sort Method: external merge Disk: 65256kB Buffers: shared hit=33 read=16426, temp read=32621 written=32894 -> Seq Scan on public.tasks_mm_workitems (cost=0.00..53488.94 rows=3702994 width=8) (actual time=0.045..1177.202 rows=3702994 loops=1) Output: tasks_mm_workitems.workitem_n, tasks_mm_workitems.task_n Buffers: shared hit=33 read=16426 Settings: effective_cache_size = '500MB', enable_hashjoin = 'off', enable_memoize = 'off', hash_mem_multiplier = '1', max_parallel_workers_per_gather = '1', work_mem = '1MB' Planning: Buffers: shared hit=8 Planning Time: 0.677 ms Execution Time: 13364.545 ms (23 rows) Thank you in advance, Dimitris