I am having problems with a join where the planner picks a merge join and an index scan on one of the tables. Manually disabling merge joins and running the query both ways shows the merge join takes over 10 seconds while a hash join takes less than 100ms. The planner total cost estimate favors the merge join, but the cost estimate for the index scan part is greater than the total cost estimate by a factor of 300x. My understanding of how this can occur is that it expects it won't actually have to scan all the rows, because using the histogram distribution stats it can know that all the relevant rows of the join column will be at the beginning of the scan. But in practice it appears to actually be index scanning all the rows, showing massive amounts of page hits. What is also confusing is that the planner estimate of the number of rows that match the second join condition is accurate and very low, so I would expect it to index scan on that column's index instead. Pasted at the bottom is the explain plan for the query and some other variations I think might be relevant. The table/index names are obfuscated. I ran ANALYZE on all the tables in the query first. All the pages are cached in the explain plans but we wouldn't expect that to be true in the production system. There are btree indexes on all the columns in both the join conditions and the filters. Searching, I found this thread http://postgresql.nabble.com/merge-join-killing-performance-td2076433.html which sounds kind of similar, but there are no Nulls in this table. Thanks for your help. Postgres version info: PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit ----------------------- Original Query The estimated cost for Index Scan is 898k but the total cost estimate is 2.6k. The planner has a good estimate of the number of rows, 1335, for the index scan, but by the number of page hits (8M) it appears it actually scanned the entire table which has about 8M rows. ----------------------- EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicles v LEFT JOIN usagestats ON v.id = tid AND type = 'vehicle'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Right Join (cost=593.28..2634.10 rows=4155 width=619) (actual time=9.150..11464.949 rows=4155 loops=1) Merge Cond: (usagestats.tid = s.id) Buffers: shared hit=8063988 -> Index Scan using usagestats_tid_idx on usagestats (cost=0.00..898911.91 rows=1335 width=37) (actual time=0.027..11448.789 rows=2979 loops=1) Filter: ((type)::text = 'vehicle'::text) Buffers: shared hit=8063686 -> Sort (cost=593.28..603.67 rows=4155 width=582) (actual time=9.108..10.429 rows=4155 loops=1) Sort Key: s.id Sort Method: quicksort Memory: 1657kB Buffers: shared hit=302 -> Seq Scan on vehicles v (cost=0.00..343.55 rows=4155 width=582) (actual time=0.014..2.917 rows=4155 loops=1) Buffers: shared hit=302 Total runtime: 11466.122 ms (13 rows) ------------------------ Change the type='vehicle' condition to an always true condition If we change the filter from "type = 'vehicle'" (True for a small fraction of the rows) to "freq > -1" (True for all rows) then the plan is the same, but the actual time and page hits are much less and the query returns is fast. ------------------------ EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicle v LEFT JOIN usagestats ON (v.id = tid AND freq > -1); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Right Join (cost=593.28..2434.79 rows=7733 width=619) (actual time=5.635..59.852 rows=17096 loops=1) Merge Cond: (usagestats.tid = v.id) Buffers: shared hit=17653 -> Index Scan using usagestats_tid_idx on usagestats (cost=0.00..898914.00 rows=8006976 width=37) (actual time=0.010..34.075 rows=17225 loops=1) Filter: (freq > (-1)) Buffers: shared hit=17351 -> Sort (cost=593.28..603.67 rows=4155 width=582) (actual time=5.617..9.351 rows=17094 loops=1) Sort Key: v.id Sort Method: quicksort Memory: 1657kB Buffers: shared hit=302 -> Seq Scan on vehicle v (cost=0.00..343.55 rows=4155 width=582) (actual time=0.009..1.803 rows=4157 loops=1) Buffers: shared hit=302 Total runtime: 62.868 ms (13 rows) ---------------------- Original Query with merge joins disabled If we manually disable merge joins and run the original query we get a hash join with what seems like a more reasonable index scan on the more selective type column. The total cost estimate is higher than the merge join plan, but lower than the cost estimate for the index scan in the merge join query. --------------------- BEGIN; SET LOCAL enable_mergejoin = off; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicle v LEFT JOIN usagestats ON v.id = tid AND type = 'vehicle'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=395.49..5158.10 rows=4155 width=619) (actual time=8.038..20.886 rows=4155 loops=1) Hash Cond: (usagestats.tid = v.id) Buffers: shared hit=3250 -> Index Scan using usagestats_type_idx on usagestats (cost=0.00..4752.59 rows=1335 width=37) (actual time=0.100..6.770 rows=2979 loops=1) Index Cond: ((type)::text = 'vehicle'::text) Buffers: shared hit=2948 -> Hash (cost=343.55..343.55 rows=4155 width=582) (actual time=7.908..7.908 rows=4155 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1088kB Buffers: shared hit=302 -> Seq Scan on vehicle v (cost=0.00..343.55 rows=4155 width=582) (actual time=0.021..3.068 rows=4155 loops=1) Buffers: shared hit=302 Total runtime: 21.936 ms (12 rows) ----------------------- Miscellaneous stats ----------------------- SELECT COUNT(*) FROM vehicle; count ------- 4155 (1 row) SELECT COUNT(*) FROM usagestats; count --------- 8007015 (1 row) The usagestats table has 501 histogram buckets for the tid column. The max id in the vehicle table is 4155 and the first two buckets of the histogram cover all the values between 1 and 4500. -- View this message in context: http://postgresql.nabble.com/Merge-Join-chooses-very-slow-index-scan-tp5842523.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance