Re: Why won't it index scan?

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 17 May 2006 15:26:22 -0400

"Ed L." <pgsql@xxxxxxxxxxxxx> writes:
> I'm trying to understand what happened here, and I have a theory.  

The problem is the horrid misestimation of the selectivity of
"nursestation_key = 40":

               ->  Bitmap Index Scan on idx_visit_nursestation_key  (cost=0.00..69.35 rows=4956 width=0) (actual time=0.158..0.158 rows=6 loops=1)
                     Index Cond: (nursestation_key = 40)

When you're off by a factor of 800+ on the number of matching rows,
you're going to arrive at a less than optimal plan.  Increasing the
stats target on visit.nursestation_key would be the solution.

> There are 389K rows total, and 262K rows with a null indexed 
> value.  Their are 15164 non-null rows newer than those null 
> rows.  When stats target is set to 50 or less, analyze scans 
> 15,000 rows or less.  If it scans the newest rows/pages first, 
> then is it possible it never sees any hint of the 262K null 
> rows, and thus ends up with skewed stats that yield seq scans 
> when idx scan is in order?

ANALYZE goes to some considerable trouble to make sure it gets an
unbiased random sample.  With those numbers it would see an expected
500+ of the nonnull rows; a sample containing none at all would be
highly improbable.

> Also, I see the most_common_vals array is not growing linearly 
> with the stats target as the docs seem to suggest.  I have 34 
> unique values, so with stats target >= 34, I'd expect 
> most_common_vals array to have 34 values, but it has 8.

To get into most_common_vals, a value has to occur more than once in the
sample.  Given the situation you have, it's not surprising that not all
the possible values got into the stats.

			regards, tom lane