Merge Join chooses very slow index scan

Jake Magner <jakemagner90@xxxxxxxxx> · Wed, 18 Mar 2015 23:23:34 -0700 (MST)

I am having problems with a join where the planner picks a merge join and an
index scan on one of the tables. Manually disabling merge joins and running
the query both ways shows the merge join takes over 10 seconds while a hash
join takes less than 100ms. The planner total cost estimate favors the merge
join, but the cost estimate for the index scan part is greater than the
total cost estimate by a factor of 300x. My understanding of how this can
occur is that it expects it won't actually have to scan all the rows,
because using the histogram distribution stats it can know that all the
relevant rows of the join column will be at the beginning of the scan. But
in practice it appears to actually be index scanning all the rows, showing
massive amounts of page hits. What is also confusing is that the planner
estimate of the number of rows that match the second join condition is
accurate and very low, so I would expect it to index scan on that column's
index instead. Pasted at the bottom is the explain plan for the query and
some other variations I think might be relevant. The table/index names are
obfuscated. I ran ANALYZE on all the tables in the query first. All  the
pages are cached in the explain plans but we wouldn't expect that to be true
in the production system. There are btree indexes on all the columns in both
the join conditions and the filters.

Searching, I found this thread
http://postgresql.nabble.com/merge-join-killing-performance-td2076433.html
which sounds kind of similar, but there are no Nulls in this table.

Thanks for your help.

Postgres version info: PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu,
compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

-----------------------
Original Query

The estimated cost for Index Scan is 898k but the total cost estimate is
2.6k. The planner has a good estimate of the number of rows, 1335, for the
index scan, but by the number of page hits (8M) it appears it actually
scanned the entire table which has about 8M rows.
-----------------------
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicles v LEFT JOIN usagestats ON
v.id = tid AND type = 'vehicle';

QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=593.28..2634.10 rows=4155 width=619) (actual
time=9.150..11464.949 rows=4155 loops=1)
   Merge Cond: (usagestats.tid = s.id)
   Buffers: shared hit=8063988
   ->  Index Scan using usagestats_tid_idx on usagestats 
(cost=0.00..898911.91 rows=1335 width=37) (actual time=0.027..11448.789
rows=2979 loops=1)
         Filter: ((type)::text = 'vehicle'::text)
         Buffers: shared hit=8063686
   ->  Sort  (cost=593.28..603.67 rows=4155 width=582) (actual
time=9.108..10.429 rows=4155 loops=1)
         Sort Key: s.id
         Sort Method: quicksort  Memory: 1657kB
         Buffers: shared hit=302
         ->  Seq Scan on vehicles v  (cost=0.00..343.55 rows=4155 width=582)
(actual time=0.014..2.917 rows=4155 loops=1)
               Buffers: shared hit=302
 Total runtime: 11466.122 ms
(13 rows)

------------------------
Change the type='vehicle' condition to an always true condition

If we change the filter from "type = 'vehicle'" (True for a small fraction
of the rows) to "freq > -1" (True for all rows) then the plan is the same,
but the actual time and page hits are much less and the query returns is
fast.
------------------------
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicle v LEFT JOIN usagestats ON
(v.id = tid AND freq > -1);

QUERY PLAN                                                                                    

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Merge Right Join  (cost=593.28..2434.79 rows=7733 width=619) (actual
time=5.635..59.852 rows=17096 loops=1)

   Merge Cond: (usagestats.tid = v.id)

   Buffers: shared hit=17653

   ->  Index Scan using usagestats_tid_idx on usagestats 
(cost=0.00..898914.00 rows=8006976 width=37) (actual time=0.010..34.075
rows=17225 loops=1)

         Filter: (freq > (-1))

         Buffers: shared hit=17351

   ->  Sort  (cost=593.28..603.67 rows=4155 width=582) (actual
time=5.617..9.351 rows=17094 loops=1)

         Sort Key: v.id

         Sort Method: quicksort  Memory: 1657kB

         Buffers: shared hit=302

         ->  Seq Scan on vehicle v  (cost=0.00..343.55 rows=4155 width=582)
(actual time=0.009..1.803 rows=4157 loops=1)

               Buffers: shared hit=302

 Total runtime: 62.868 ms

(13 rows)

----------------------
Original Query with merge joins disabled

If we manually disable merge joins and run the original query we get a hash
join with what seems like
a more reasonable index scan on the more selective type column. The total
cost estimate is higher than the merge join plan, but lower than the cost
estimate for the index scan in the merge join query.
---------------------
BEGIN;
SET LOCAL enable_mergejoin = off;

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicle v LEFT JOIN usagestats ON
v.id = tid AND type = 'vehicle';

QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=395.49..5158.10 rows=4155 width=619) (actual
time=8.038..20.886 rows=4155 loops=1)
   Hash Cond: (usagestats.tid = v.id)
   Buffers: shared hit=3250
   ->  Index Scan using usagestats_type_idx on usagestats 
(cost=0.00..4752.59 rows=1335 width=37) (actual time=0.100..6.770 rows=2979
loops=1)
         Index Cond: ((type)::text = 'vehicle'::text)
         Buffers: shared hit=2948
   ->  Hash  (cost=343.55..343.55 rows=4155 width=582) (actual
time=7.908..7.908 rows=4155 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1088kB
         Buffers: shared hit=302
         ->  Seq Scan on vehicle v  (cost=0.00..343.55 rows=4155 width=582)
(actual time=0.021..3.068 rows=4155 loops=1)
               Buffers: shared hit=302
 Total runtime: 21.936 ms
(12 rows)

-----------------------
Miscellaneous stats
-----------------------

SELECT COUNT(*) FROM vehicle;
 count 
-------
  4155
(1 row)

SELECT COUNT(*) FROM usagestats;
  count  
---------
 8007015
(1 row)

The usagestats table has 501 histogram buckets for the tid column. The max
id in the vehicle table is 4155 and the first two buckets of the histogram
cover all the values between 1 and 4500.

--
View this message in context: http://postgresql.nabble.com/Merge-Join-chooses-very-slow-index-scan-tp5842523.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance