Re: Any way to favor index scans, but not bitmap index scans?

"Francisco Reyes" <lists@xxxxxxxxxxxxxxxx> · Wed, 23 Jul 2008 16:12:10 -0400

On 3:37 pm 07/23/08 Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> "Francisco Reyes" <lists@xxxxxxxxxxxxxxxx> writes:
> >  SET ENABLE_SEQSCAN TO OFF;
> >  SET ENABLE_BITMAPSCAN TO OFF;
> >  Aggregate  (cost=25665216.10..25665216.11 rows=1 width=12) (actual
> >  time=3088.894..3088.896 rows=1 loops=1)
> >     ->  Nested Loop  (cost=0.00..25662307.70 rows=387785 width=12)
> >               (actual time=0.264..2624.680 rows=194734 loops=1)
> >           ->  Index Scan using join_ids_join_id on join_ids
> >               (cost=0.00..2867051.21 rows=5020 width=4) (actual
> >  time=0.237..1236.019 rows=4437 loops=1)
> >                 Filter: (customer_id = ANY ('{1014,2485,4636,4635,12
> 55,547,374,580}'::integer[]))
> >           ->  Index Scan using historical_join_id_date on historical
> >               (cost=0.00..4522.43 rows=1477 width=16) (actual
> >  time=0.010..0.153 rows=44 loops=4437)
> >                 Index Cond: ((historical.join_id =
> >  join_ids.join_id) AND (historical.date > '2007-04-01'::date)
> >                 AND (historical.date < '2008-05-01'::date))
> >                 Filter: (trans.f5 > 0::numeric)
> >   Total runtime: 3091.227 ms --> 3 seconds
>
> You might be more likely to get a sane plan if you had an index on
> join_ids.customer_id.

There is an index in join_ids:
joinids_customerids_joinid" btree (customer_id, joinid) WITH (fillfactor=98)

Also, that plan is only 3 seconds. That is as good as that is going to get.
Or where you refering that the other plans would have been better?

> The first indexscan above is really a
> completely silly choice, and would never have been used if you
> weren't holding a gun to the planner's head.

I have much to learn about how to properly read an explain analyze, but as
silly as that plan may look it outperforms the other plans by orders of
magnitude. 3 seconds vs 12 minutes is a very big difference. It was so fast
that I even compared the results (which happens to be a single row) to make
sure I was getting the correct value.

>The index isn't contributing any selectivity at all.

Which index scan? Are analyze read bottom up right?
If it is this one you are refering to:

->  Index Scan using historical_join_id_date on historical
> >               (cost=0.00..4522.43 rows=1477 width=16) (actual
> >  time=0.010..0.153 rows=44 loops=4437)
> >                 Index Cond: ((historical.join_id =
> >  join_ids.join_id) AND (historical.date > '2007-04-01'::date)
> >                 AND (historical.date < '2008-05-01'::date))
> >                 Filter: (trans.f5 > 0::numeric)

I believe that is the reason performance is good with that plan.
The number of rows that need to be returned from historical is less than 1%.

> The other part of the problem is the factor-of-thirty overestimate of
> the number of rows that the inner indexscan will produce (which means
> also a factor-of-thirty overestimate of its cost).  Perhaps higher
> statistics targets for these two relations would give you a better
> estimate there.

Is it possible to go over 
default_statistics_target = 1000?

> since there's no additional join condition.  What PG version are you
> running exactly?

8.3.3

I have only been at this job for 3 months and I can say that neither the
data, nor the previous design I am trying to replace play nice with
postgresql. I can't get into specifics, but I can say that our "historical"
tables have about 60% data that is not used in most queries. I think that
is partly what throws off the planner so much. My first clue was when I saw
the planner trying to do sequential scans to retrieve less than 1% of rows.
It didn't make sense.

I tried several schemes with partitioning and that was even worse.

I am going to convert the tables structure names to the mapping names I
used in these thread. Perhaps that may be informative.