Re: Function scan/Index scan to nested loop

"Carlo Stonebanks" <stonec.register@xxxxxxxxxxxx> · Tue, 11 May 2010 14:00:43 -0400

Thanks Scott,

This is almost always due to caching.  First time the data aren't in the 
cache, second time they are.
<<

I had assumed that it was caching, but I don't know from where because of 
the inexplicable delay. Hardware? O/S (Linux)? DB? From the function, which 
is IMMUTABLE?

I am concerned that there is such a lag between all the index and function 
scans start/complete times and and the nested loops starting. I have 
reformatted the SLOW PLAN results below to make them easier to read. Can you 
tell me if this makes any sense to you?

I can understand that EXPLAIN might inject some waste, but the delay being 
shown here is equivalent to the delay in real query times - I don't think 
EXPLAIN components would inject 15 second waits... would they?

Your row estimates are WAY off.  A nested loop might now be the best 
choice.
<<

I tried to run this with set enable_nestloop to off and it built this truly 
impressively complex plan! However, the cache had already spun up. The thing 
that makes testing so difficult is that once the caches are loaded, you have 
to flail around trying to find query parameters that DON'T hit the cache, 
making debugging difficult.

The row estimates being off is a chronic problem with our DB. I don't think 
the 3000 row ANALYZE is getting a proper sample set and would love to change 
the strategy, even if at the expense of speed of execution of ANALYZE. I 
don't know what it is about our setup that makes our PG servers so hard to 
tune, but I think its time to call the cavalry (gotta find serious PG server 
tuning experts in NJ).

Carlo

SLOW PLAN
Sort  (cost=42869.40..42869.59 rows=77 width=18) (actual 
time=26316.495..26322.102 rows=9613 loops=1)
 Sort Key: p.provider_id, zips_in_mile_range.distance
 Sort Method:  quicksort  Memory: 1136kB
 ->  Nested Loop
    (cost=0.00..42866.98 rows=77 width=18)
    (actual time=126.354..26301.027 rows=9613 loops=1)
       ->  Nested Loop
           (cost=0.00..42150.37 rows=122 width=18)
           (actual time=117.369..15349.533 rows=13247 loops=1)
             ->  Function Scan on zips_in_mile_range
                 (cost=0.00..52.50 rows=67 width=40)
                 (actual time=104.196..104.417 rows=155 loops=1)
                 Filter: (zip > ''::text)
             ->  Index Scan using 
provider_practice_default_base_zip_country_idx on provider_practice pp
                 (cost=0.00..628.30 rows=2 width=19)
                 (actual time=1.205..98.231 rows=85 loops=155)
                 Index Cond: ((pp.default_country_code = 'US'::bpchar)
                    AND (substr((pp.default_postal_code)::text, 1, 5) = 
zips_in_mile_range.zip)
                    AND (pp.is_principal = 'Y'::bpchar))
                 Filter: (COALESCE(pp.record_status, 'A'::bpchar) = 
'A'::bpchar)
       ->  Index Scan using provider_provider_id_provider_status_code_idx 
on provider p
           (cost=0.00..5.86 rows=1 width=4)
           (actual time=0.823..0.824 rows=1 loops=13247)
           Index Cond: ((p.provider_id = pp.provider_id)
              AND (p.provider_status_code = 'A'::bpchar))
           Filter: (p.is_visible = 'Y'::bpchar)

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance