IN not handled very well?

Ben <bench@xxxxxxxxxxxxxxx> · Sun, 24 Sep 2006 10:12:23 -0700

I've got this query with an IN clause:

select count(*),public.album.gid,public.album.name,public.album.id  
from public.album,public.albumjoin,public.puid,public.puidjoin where  
albumjoin.album = public.album.id and public.puidjoin.track =  
public.albumjoin.track and public.puid.id = public.puidjoin.puid and  
public.puid.puid
IN (select umdb.puid.name from umdb.puid,umdb.node where umdb.puid.id  
= umdb.node.puid and umdb.node.dir=5886)
 group by gid,name,public.album.id having count(*) >= 6 order by  
count(*) desc;

It gives me a rather expensive plan:

 Sort  (cost=35729.07..35729.75 rows=272 width=69)
   Sort Key: count(*)
   ->  HashAggregate  (cost=35713.31..35718.07 rows=272 width=69)
         Filter: (count(*) >= 6)
         ->  Nested Loop  (cost=51.67..35709.91 rows=272 width=69)
               ->  Nested Loop  (cost=51.67..34216.30 rows=272 width=4)
                     ->  Nested Loop  (cost=51.67..33338.04 rows=272  
width=4)
                           ->  Hash IN Join  (cost=51.67..31794.72  
rows=218 width=4)
                                 Hash Cond: (("outer".puid)::text =  
"inner".name)
                                 ->  Seq Scan on puid   
(cost=0.00..23495.21 rows=1099421 width=44)
                                 ->  Hash  (cost=51.63..51.63  
rows=15 width=40)
                                       ->  Nested Loop   
(cost=0.00..51.63 rows=15 width=40)
                                             ->  Index Scan using  
node_dir on node  (cost=0.00..3.22 rows=16 width=4)
                                                   Index Cond: (dir  
= 5886)
                                             ->  Index Scan using  
puid_pkey on puid  (cost=0.00..3.01 rows=1 width=44)
                                                   Index Cond:  
(puid.id = "outer".puid)
                           ->  Index Scan using puidjoin_puidtrack  
on puidjoin  (cost=0.00..7.05 rows=2 width=8)
                                 Index Cond: ("outer".id =  
puidjoin.puid)
                     ->  Index Scan using albumjoin_trackindex on  
albumjoin  (cost=0.00..3.22 rows=1 width=8)
                           Index Cond: ("outer".track =  
albumjoin.track)
               ->  Index Scan using album_pkey on album   
(cost=0.00..5.48 rows=1 width=69)
                     Index Cond: ("outer".album = album.id)

If I'm reading this right, it looks like it's expensive because it's  
doing a sequential scan on public.puid.puid to satisfy the IN clause.  
(Although why it's doing that I'm not sure, given that there's a  
recently analyzed index on public.puid.puid.) Interestingly, if I  
replace that IN subselect with the 15 values it will return, my plan  
improves by two orders of magnitude:

 Sort  (cost=235.53..235.56 rows=12 width=69)
   Sort Key: count(*)
   ->  HashAggregate  (cost=235.11..235.32 rows=12 width=69)
         Filter: (count(*) >= 6)
         ->  Nested Loop  (cost=20.03..234.96 rows=12 width=69)
               ->  Nested Loop  (cost=20.03..169.06 rows=12 width=4)
                     ->  Nested Loop  (cost=20.03..130.32 rows=12  
width=4)
                           ->  Bitmap Heap Scan on puid   
(cost=20.03..59.52 rows=10 width=4)
                                 Recheck Cond: ((puid =  
'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar) OR (puid =  
'7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar) OR (puid =  
'724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar) OR (puid =  
'02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar) OR (puid = '165d5bea- 
b21f-9302-b991-0927f491787b'::bpchar) OR (puid = '4223dbc8-85af-a92e- 
b63d-72a726475e2c'::bpchar) OR (puid = '2d43ef9a- 
c7ee-2425-7fac-8b937cbed178'::bpchar) OR (puid = '9ff81c2f-04b7- 
cf5d-705f-7b944a5ae093'::bpchar) OR (puid = 'deaddddd-dfaf-18dd-6d4d- 
c483e8ba60f7'::bpchar) OR (puid = '20939b69- 
ff98-770a-1444-3b0e9892712f'::bpchar))
                                 ->  BitmapOr  (cost=20.03..20.03  
rows=10 width=0)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'f68dcf86-992c-2e4a-21fb-2fc8c56edfeb'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'7716dbcf-56ab-623b-ab33-3b2e67a0727c'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'724d6a39-0d15-a296-2dd2-127c34f13809'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'02f0cd9f-9fa5-abda-06cd-5dbb13826243'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'165d5bea-b21f-9302-b991-0927f491787b'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'4223dbc8-85af-a92e-b63d-72a726475e2c'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'2d43ef9a-c7ee-2425-7fac-8b937cbed178'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'9ff81c2f-04b7-cf5d-705f-7b944a5ae093'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'deaddddd-dfaf-18dd-6d4d-c483e8ba60f7'::bpchar)
                                       ->  Bitmap Index Scan on  
puid_puidindex  (cost=0.00..2.00 rows=1 width=0)
                                             Index Cond: (puid =  
'20939b69-ff98-770a-1444-3b0e9892712f'::bpchar)
                           ->  Index Scan using puidjoin_puidtrack  
on puidjoin  (cost=0.00..7.05 rows=2 width=8)
                                 Index Cond: ("outer".id =  
puidjoin.puid)
                     ->  Index Scan using albumjoin_trackindex on  
albumjoin  (cost=0.00..3.22 rows=1 width=8)
                           Index Cond: ("outer".track =  
albumjoin.track)
               ->  Index Scan using album_pkey on album   
(cost=0.00..5.48 rows=1 width=69)
                     Index Cond: ("outer".album = album.id)

I guess my question is: if postgres is (correctly) estimating that  
only 15 rows will come out of the subselect, and it knows it can  
choose a much better plan with bitmap index scans, should it be able  
to choose the bitmap plan over the sequential scan? Or should I run  
the subselect myself and then rewrite my query to push in the  
constant values?