Re: Why won't it index scan?

"Ed L." <pgsql@xxxxxxxxxxxxx> · Wed, 17 May 2006 12:50:14 -0600

On Wednesday May 17 2006 11:44 am, Ed L. wrote:
> On Wednesday May 17 2006 10:37 am, Ed L. wrote:
> > Can someone help me understand why the 8.1.2 query below is
> > using a seq scan instead of an index scan?  All relevant
> > columns appear to be indexed and all tables vacuum analyzed.
> >
> >
> > $ psql -c "explain analyze select * from visit inner join
> > patient on patient.key = visit.patient_key where
> > nursestation_key = '40';" QUERY PLAN
> > ------------------------------------------------------------
> >--
> > ------------------------------------------------------------
> >--- ---------------------- Merge Join 
> > (cost=27724.37..28457.01 rows=4956 width=421) (actual
> > time=1819.993..2004.802 rows=6 loops=1) Merge Cond:
> > ("outer".patient_key = "inner"."key") ->  Sort 
> > (cost=11859.31..11871.70 rows=4956 width=209) (actual
> > time=0.416..0.426 rows=6 loops=1) Sort Key:
> > visit.patient_key
> >          ->  Bitmap Heap Scan on visit 
> > (cost=69.35..11555.14 rows=4956 width=209) (actual
> > time=0.187..0.245 rows=6 loops=1) Recheck Cond:
> > (nursestation_key = 40)
> >                ->  Bitmap Index Scan on
> > idx_visit_nursestation_key  (cost=0.00..69.35 rows=4956
> > width=0) (actual time=0.158..0.158 rows=6 loops=1) Index
> > Cond: (nursestation_key = 40) ->  Sort 
> > (cost=15865.05..16194.21 rows=131661 width=212) (actual
> > time=1768.501..1856.334 rows=61954 loops=1) Sort Key:
> > patient."key"
> >          ->  Seq Scan on patient  (cost=0.00..4669.61
> > rows=131661 width=212) (actual time=0.010..355.299
> > rows=131661 loops=1) Total runtime: 2046.323 ms
> > (12 rows)
>
> Increasing statistics target yielded index scan.
>
> How can I best find optimal statistics target to ensure 100%
> index scan?

I'm trying to understand what happened here, and I have a theory.  
There are 389K rows total, and 262K rows with a null indexed 
value.  Their are 15164 non-null rows newer than those null 
rows.  When stats target is set to 50 or less, analyze scans 
15,000 rows or less.  If it scans the newest rows/pages first, 
then is it possible it never sees any hint of the 262K null 
rows, and thus ends up with skewed stats that yield seq scans 
when idx scan is in order?  If stat target is > 50, analyze 
begins to include non-null rows in stat sample, yielding idx 
scans.

Also, I see the most_common_vals array is not growing linearly 
with the stats target as the docs seem to suggest.  I have 34 
unique values, so with stats target >= 34, I'd expect 
most_common_vals array to have 34 values, but it has 8.

Ed