IN not handled very well?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've got this query with an IN clause:

select count(*),public.album.gid,public.album.name,public.album.id from public.album,public.albumjoin,public.puid,public.puidjoin where albumjoin.album = public.album.id and public.puidjoin.track = public.albumjoin.track and public.puid.id = public.puidjoin.puid and public.puid.puid IN (select umdb.puid.name from umdb.puid,umdb.node where umdb.puid.id = umdb.node.puid and umdb.node.dir=5886) group by gid,name,public.album.id having count(*) >= 6 order by count(*) desc;

It gives me a rather expensive plan:

 Sort  (cost=35729.07..35729.75 rows=272 width=69)
   Sort Key: count(*)
   ->  HashAggregate  (cost=35713.31..35718.07 rows=272 width=69)
         Filter: (count(*) >= 6)
         ->  Nested Loop  (cost=51.67..35709.91 rows=272 width=69)
               ->  Nested Loop  (cost=51.67..34216.30 rows=272 width=4)
-> Nested Loop (cost=51.67..33338.04 rows=272 width=4) -> Hash IN Join (cost=51.67..31794.72 rows=218 width=4) Hash Cond: (("outer".puid)::text = "inner".name) -> Seq Scan on puid (cost=0.00..23495.21 rows=1099421 width=44) -> Hash (cost=51.63..51.63 rows=15 width=40) -> Nested Loop (cost=0.00..51.63 rows=15 width=40) -> Index Scan using node_dir on node (cost=0.00..3.22 rows=16 width=4) Index Cond: (dir = 5886) -> Index Scan using puid_pkey on puid (cost=0.00..3.01 rows=1 width=44) Index Cond: (puid.id = "outer".puid) -> Index Scan using puidjoin_puidtrack on puidjoin (cost=0.00..7.05 rows=2 width=8) Index Cond: ("outer".id = puidjoin.puid) -> Index Scan using albumjoin_trackindex on albumjoin (cost=0.00..3.22 rows=1 width=8) Index Cond: ("outer".track = albumjoin.track) -> Index Scan using album_pkey on album (cost=0.00..5.48 rows=1 width=69)
                     Index Cond: ("outer".album = album.id)

If I'm reading this right, it looks like it's expensive because it's doing a sequential scan on public.puid.puid to satisfy the IN clause. (Although why it's doing that I'm not sure, given that there's a recently analyzed index on public.puid.puid.) Interestingly, if I replace that IN subselect with the 15 values it will return, my plan improves by two orders of magnitude:

 Sort  (cost=235.53..235.56 rows=12 width=69)
   Sort Key: count(*)
   ->  HashAggregate  (cost=235.11..235.32 rows=12 width=69)
         Filter: (count(*) >= 6)
         ->  Nested Loop  (cost=20.03..234.96 rows=12 width=69)
               ->  Nested Loop  (cost=20.03..169.06 rows=12 width=4)
-> Nested Loop (cost=20.03..130.32 rows=12 width=4) -> Bitmap Heap Scan on puid (cost=20.03..59.52 rows=10 width=4) Recheck Cond: ((puid = 'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar) OR (puid = '7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar) OR (puid = '724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar) OR (puid = '02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar) OR (puid = '165d5bea- b21f-9302-b991-0927f491787b'::bpchar) OR (puid = '4223dbc8-85af-a92e- b63d-72a726475e2c'::bpchar) OR (puid = '2d43ef9a- c7ee-2425-7fac-8b937cbed178'::bpchar) OR (puid = '9ff81c2f-04b7- cf5d-705f-7b944a5ae093'::bpchar) OR (puid = 'deaddddd-dfaf-18dd-6d4d- c483e8ba60f7'::bpchar) OR (puid = '20939b69- ff98-770a-1444-3b0e9892712f'::bpchar)) -> BitmapOr (cost=20.03..20.03 rows=10 width=0) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = 'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '165d5bea-b21f-9302-b991-0927f491787b'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '4223dbc8-85af-a92e-b63d-72a726475e2c'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '2d43ef9a-c7ee-2425-7fac-8b937cbed178'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '9ff81c2f-04b7-cf5d-705f-7b944a5ae093'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = 'deaddddd-dfaf-18dd-6d4d-c483e8ba60f7'::bpchar) -> Bitmap Index Scan on puid_puidindex (cost=0.00..2.00 rows=1 width=0) Index Cond: (puid = '20939b69-ff98-770a-1444-3b0e9892712f'::bpchar) -> Index Scan using puidjoin_puidtrack on puidjoin (cost=0.00..7.05 rows=2 width=8) Index Cond: ("outer".id = puidjoin.puid) -> Index Scan using albumjoin_trackindex on albumjoin (cost=0.00..3.22 rows=1 width=8) Index Cond: ("outer".track = albumjoin.track) -> Index Scan using album_pkey on album (cost=0.00..5.48 rows=1 width=69)
                     Index Cond: ("outer".album = album.id)

I guess my question is: if postgres is (correctly) estimating that only 15 rows will come out of the subselect, and it knows it can choose a much better plan with bitmap index scans, should it be able to choose the bitmap plan over the sequential scan? Or should I run the subselect myself and then rewrite my query to push in the constant values?


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux