pg_trgm indexes giving bad estimations?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've got another query I'm trying to optimize:

select aj.album from
public.track t
join public.albumjoin aj
on (aj.track = t.id)
join (select id from public.albummeta am where tracks between 10 and 14) lam
on (lam.id = aj.album)
where (t.name % '01New OrderEvil Dust' or t.name % '04OrbitalOpen Mind')
group by aj.album having count(*) >= 9.6;

This gives an expensive (but still reasonable) plan of:

HashAggregate  (cost=76523.64..76602.25 rows=4492 width=4)
   Filter: ((count(*))::numeric >= 9.6)
   ->  Hash Join  (cost=63109.73..76501.18 rows=4492 width=4)
         Hash Cond: ("outer".id = "inner".album)
-> Bitmap Heap Scan on albummeta am (cost=1810.10..9995.34 rows=187683 width=4)
               Recheck Cond: ((tracks >= 10) AND (tracks <= 14))
-> Bitmap Index Scan on albummeta_tracks_index (cost=0.00..1810.10 rows=187683 width=0)
                     Index Cond: ((tracks >= 10) AND (tracks <= 14))
         ->  Hash  (cost=61274.03..61274.03 rows=10243 width=4)
-> Nested Loop (cost=163.87..61274.03 rows=10243 width=4) -> Bitmap Heap Scan on track t (cost=163.87..28551.33 rows=10243 width=4) Recheck Cond: (((name)::text % '01New OrderEvil Dust'::text) OR ((name)::text % '04OrbitalOpen Mind'::text)) -> BitmapOr (cost=163.87..163.87 rows=10248 width=0) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '01New OrderEvil Dust'::text) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '04OrbitalOpen Mind'::text) -> Index Scan using albumjoin_trackindex on albumjoin aj (cost=0.00..3.18 rows=1 width=8)
                           Index Cond: (aj.track = "outer".id)
(19 rows)

Unfortunately, when I modify this example to use a more typical number of trigram searches or'd together (anywhere from 10 to 20), the planner thinks the bitmap heap scan on track t will return a lot of rows, and so reverts to doing a sequential scan of albumjoin for the next table join. That would make sense.... IF there were a lot of rows returned by the bitmap index scans. But here is where the planner gets it really wrong, if I'm reading it right.

It seems to think both my index scans will return 5124 rows, when, in reality, it's a lot less:

select count(*) from public.track where name % '01New OrderEvil Dust';
count
-------
    20
(1 row)

select count(*) from public.track where name % '04OrbitalOpen Mind';
count
-------
   123
(1 row)


How can I get the planner to not expect so many rows to be returned? A possibly related question is: because pg_tgrm lets me set the matching threshold of the % operator, how does that affect the planner?


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux