Re: Why is PostgreSQL not using my index?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Christian Roche" <Christian.Roche@xxxxxxxxxxxxx> writes:
> Now when I select a subset of the possible event IDs in the big table, PG uses the appropriate index:

> select *
>   from mixpanel_events_201409
>  where event_id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

> Bitmap Heap Scan on mixpanel_events_201409  (cost=7663.36..1102862.70 rows=410022 width=949)
>   Recheck Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))
>   ->  Bitmap Index Scan on mixpanel_idx_event_201409  (cost=0.00..7560.85 rows=410022 width=0)
>         Index Cond: (event_id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

> But when I try to join the lookup table and select from it, the index is dismissed for a full table scan with a catastrophic effect on performance:

> select *
> from mixpanel_events_201409 mp
>   inner join mixpanel_event_list ev on ( ev.id = mp.event_id )
> where ev.id in (3, 4, 5, 6, 7, 8, 9, 10, 11, 373, 375, 376, 318);

> Hash Join  (cost=20.73..2892183.32 rows=487288 width=1000)
>   Hash Cond: (mp.event_id = ev.id)
>   ->  Seq Scan on mixpanel_events_201409 mp  (cost=0.00..2809276.70 rows=20803470 width=949)
>   ->  Hash  (cost=20.57..20.57 rows=13 width=51)
>         ->  Seq Scan on mixpanel_event_list ev  (cost=0.00..20.57 rows=13 width=51)
>               Filter: (id = ANY ('{3,4,5,6,7,8,9,10,11,373,375,376,318}'::integer[]))

Given the estimated costs and rowcounts here, I'm far from convinced that
the planner made the wrong decision.  You seem to be expecting that it
will go for a nestloop plan that would require 13 separate indexscans of
the large table.  Those are unlikely to be only 1/13th the cost of the
unified bitmap scan with =ANY; there's going to be overhead from repeated
work.  If there's say a factor of 2 penalty for the repeated scans, that'd
be plenty enough to push the cost of that plan to be more than the
hashjoin.

If, indeed, the hashjoin is slower, that may suggest that you need to dial
down random_page_cost to better represent your environment.  But you
should be wary of making such an adjustment on the basis of a single
example; you might find that it makes other plan choices worse.

			regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux