In investigating a slow query, I distiled the code below from a larger
query:
SELECT
*
FROM
/* SUBQUERY banners */ (
SELECT
*
FROM
/* SUBQUERY banners_links */ (
SELECT
*
FROM
banners_links
WHERE
merchant_id = 5631
) as banners_links
WHERE
merchant_id = 5631 AND
banners_links.status = 0
) AS banners
LEFT OUTER JOIN
/* SUBQUERY types */ (
SELECT
banner_types.id AS type_id,
banner_types.type AS type,
banners_banner_types.banner_id AS id
FROM
banner_types,banners_banner_types
WHERE
banners_banner_types.banner_id IN /* SUBQUERY */ (
SELECT
id
FROM
banners_links
WHERE
merchant_id = 5631
) AND
banners_banner_types.type_id = banner_types.id
) AS types
USING (id)
Obviously, the subquery "banners_links" is redundant. The query however is a
generated one, and this redundancy probably wasn't noted before. Logically
you would say it shouldn't hurt, but in fact it does. The above query
executes painfully slow. The left outer join is killing the performance, as
witnessed by the plan:
"Nested Loop Left Join (cost=964.12..1480.67 rows=1 width=714) (actual
time=20.801..8233.410 rows=553 loops=1)"
" Join Filter: (public.banners_links.id = banners_banner_types.banner_id)"
" -> Bitmap Heap Scan on banners_links (cost=4.35..42.12 rows=1
width=671) (actual time=0.127..0.690 rows=359 loops=1)"
" Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
" Filter: ((status)::text = '0'::text)"
" -> Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..4.35 rows=10 width=0) (actual time=0.092..0.092 rows=424
loops=1)"
" Index Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
" -> Hash Join (cost=959.77..1432.13 rows=514 width=51) (actual
time=0.896..22.611 rows=658 loops=359)"
" Hash Cond: (banners_banner_types.type_id = banner_types.id)"
" -> Hash IN Join (cost=957.32..1422.52 rows=540 width=16) (actual
time=0.894..21.878 rows=658 loops=359)"
" Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
" -> Seq Scan on banners_banner_types (cost=0.00..376.40
rows=22240 width=16) (actual time=0.003..10.149 rows=22240 loops=359)"
" -> Hash (cost=952.02..952.02 rows=424 width=8) (actual
time=0.779..0.779 rows=424 loops=1)"
" -> Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.108..0.513 rows=424
loops=1)"
" Recheck Cond: (merchant_id = 5631)"
" -> Bitmap Index Scan on
banners_links_merchant_id_idx (cost=0.00..11.43 rows=424 width=0) (actual
time=0.078..0.078 rows=424 loops=1)"
" Index Cond: (merchant_id = 5631)"
" -> Hash (cost=2.20..2.20 rows=20 width=43) (actual
time=0.033..0.033 rows=20 loops=1)"
" -> Seq Scan on banner_types (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.017 rows=20 loops=1)"
"Total runtime: 8233.710 ms"
I noticed that the recheck condition looks a bit weird:
Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))
You would think that PG (8.2.3) would be smart enough to optimize this away.
Also the estimate of the nested loop left join and the actual results are
way off. I tried increasing the statistics of both public.banners_links.id
and banners_banner_types.banner_id (to the highest value 1000), analyzed,
vacuum analyzed and did a vacuum full, but without any improvements.
Anyway, when I remove the redundant sub query the code becomes:
SELECT
*
FROM
/* SUBQUERY banners */ (
SELECT
*
FROM
banners_links
WHERE
merchant_id = 5631 AND
banners_links.status = 0
) AS banners
LEFT OUTER JOIN
/* SUBQUERY types */ (
SELECT
banner_types.id AS type_id,
banner_types.type AS type,
banners_banner_types.banner_id AS id
FROM
banner_types,banners_banner_types
WHERE
banners_banner_types.banner_id IN /* SUBQUERY */ (
SELECT
id
FROM
banners_links
WHERE
merchant_id = 5631
) AND
banners_banner_types.type_id = banner_types.id
) AS types
USING (id)
With this query, the execution time drops from 8 seconds to a mere 297 ms!
The plan now looks as follows:
"Hash Left Join (cost=1449.99..2392.68 rows=2 width=714) (actual
time=24.257..25.292 rows=553 loops=1)"
" Hash Cond: (public.banners_links.id = banners_banner_types.banner_id)"
" -> Bitmap Heap Scan on banners_links (cost=11.43..954.03 rows=2
width=671) (actual time=0.122..0.563 rows=359 loops=1)"
" Recheck Cond: (merchant_id = 5631)"
" Filter: ((status)::text = '0'::text)"
" -> Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..11.43 rows=424 width=0) (actual time=0.086..0.086 rows=424
loops=1)"
" Index Cond: (merchant_id = 5631)"
" -> Hash (cost=1432.13..1432.13 rows=514 width=51) (actual
time=24.128..24.128 rows=658 loops=1)"
" -> Hash Join (cost=959.77..1432.13 rows=514 width=51) (actual
time=1.714..23.606 rows=658 loops=1)"
" Hash Cond: (banners_banner_types.type_id = banner_types.id)"
" -> Hash IN Join (cost=957.32..1422.52 rows=540 width=16)
(actual time=1.677..22.811 rows=658 loops=1)"
" Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
" -> Seq Scan on banners_banner_types
(cost=0.00..376.40 rows=22240 width=16) (actual time=0.005..10.306
rows=22240 loops=1)"
" -> Hash (cost=952.02..952.02 rows=424 width=8)
(actual time=0.772..0.772 rows=424 loops=1)"
" -> Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.105..0.510 rows=424
loops=1)"
" Recheck Cond: (merchant_id = 5631)"
" -> Bitmap Index Scan on
banners_links_merchant_id_idx (cost=0.00..11.43 rows=424 width=0) (actual
time=0.077..0.077 rows=424 loops=1)"
" Index Cond: (merchant_id = 5631)"
" -> Hash (cost=2.20..2.20 rows=20 width=43) (actual
time=0.032..0.032 rows=20 loops=1)"
" -> Seq Scan on banner_types (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.018 rows=20 loops=1)"
"Total runtime: 25.602 ms"
We see that instead of a nested loop left join the planner now chooses a
Hash Left Join. I'm not really an expert in this matter and would like some
more insight into what's happening here. Naively I would say that a planner
would have to be smart enough to see this by itself?
Thanks in advance for all hints.
_________________________________________________________________
Talk with your online friends with Messenger
http://www.join.msn.com/messenger/overview