reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

Mark Stosberg <mark@xxxxxxxxxxxxxxx> · Wed, 14 Feb 2007 11:28:38 -0500

Merlin Moncure wrote:
On 2/14/07, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
There are two things wrong here: first, that the estimated row
count is only 20% of actual; it should certainly not be that far
off for such a simple condition. I wonder if your vacuum/analyze
procedures are actually working. Second, you mentioned somewhere
along the line that 'available' pets are about 10% of all the
entries, which means that this indexscan is more than likely
entirely counterproductive: it would be cheaper to ignore this
index altogether.

Tom,

Thanks for the generosity of your time. We are using  8.1.3 currently. I 
have read there are some performance improvements in 8.2, but we have 
not started evaluating that yet.

Your suggestion about the pet_state index was right on. I tried 
"Analyze" on it, but still got the same bad estimate. However, I then 
used "reindex" on that index, and that fixed the estimate accuracy, 
which made the query run faster! The cube search now benchmarks faster 
than the old search in production, taking about 2/3s of the time of the 
old one.

Any ideas why the manual REINDEX did something that "analyze" didn't? It 
makes me wonder if there is other tuning like this to do.

Attached is the EA output from the most recent run, after the "re-index".

I think switching the index on pet_state to a composite on (pet_state,
species_id) might help too.

or even better:

create function is_pet_available(text) returns bool as
$$
 select $1='available';
$$ language sql immutable;

create index pets_available_species_idx on
pets(is_pet_available(pet_state), species_id);

Merlin,

Thanks for this suggestion. It is not an approach I had used before, and 
I was interested to try it. However, the new index didn't get chosen. 
(Perhaps I would need to drop the old one?) However, Tom's suggestions 
did help. I'll follow up on that in just a moment.

refactor your query something similar to:

SELECT * FROM
(
SELECT
earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
FROM pets
JOIN shelters_active as shelters USING (shelter_id)
JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
JOIN zipcodes q ON q.zipcode = '90210'
WHERE
  is_pet_available(pet_state)
  AND species_id = 1
  AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
) p order by radius

merlin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Sort  (cost=5276.93..5277.00 rows=28 width=64) (actual time=1981.830..1984.415 rows=1344 loops=1)
   Sort Key: (cube_distance(public.zipcodes.earth_coords, public.zipcodes.earth_coords) / 1609.344::double precision)
   ->  Nested Loop  (cost=291.32..5276.26 rows=28 width=64) (actual time=24.080..1976.479 rows=1344 loops=1)
         ->  Nested Loop  (cost=2.15..575.79 rows=11 width=68) (actual time=2.637..34.067 rows=131 loops=1)
               ->  Nested Loop  (cost=2.15..153.48 rows=42 width=73) (actual time=1.939..3.972 rows=240 loops=1)
                     ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 width=32) (actual time=0.283..0.287 rows=1 loops=1)
                           Index Cond: ((zipcode)::text = '90210'::text)
                     ->  Bitmap Heap Scan on zipcodes  (cost=2.15..149.84 rows=42 width=41) (actual time=1.403..2.323 rows=240 loops=1)
                           Recheck Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double precision, 3) @ zipcodes.earth_coords)
                           ->  Bitmap Index Scan on zip_earth_coords_idx  (cost=0.00..2.15 rows=42 width=0) (actual time=1.377..1.377 rows=240 loops=1)
                                 Index Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double precision, 3) @ zipcodes.earth_coords)
               ->  Index Scan using shelters_postal_code_for_joining_idx on shelters  (cost=0.00..10.03 rows=2 width=12) (actual time=0.064..0.118 rows=1 loops=240)
                     Index Cond: ((shelters.postal_code_for_joining)::text = ("outer".zipcode)::text)
                     Filter: ((shelter_state)::text = 'active'::text)
         ->  Bitmap Heap Scan on pets  (cost=289.17..426.86 rows=35 width=4) (actual time=14.362..14.746 rows=10 loops=131)
               Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND ((pets.pet_state)::text = 'available'::text))
               Filter: (species_id = 1)
               ->  BitmapAnd  (cost=289.17..289.17 rows=35 width=0) (actual time=14.219..14.219 rows=0 loops=131)
                     ->  Bitmap Index Scan on pets_shelter_id_idx  (cost=0.00..3.89 rows=256 width=0) (actual time=0.188..0.188 rows=168 loops=131)
                           Index Cond: (pets.shelter_id = "outer".shelter_id)
                     ->  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..285.02 rows=41149 width=0) (actual time=22.043..22.043 rows=40397 loops=82)
                           Index Cond: ((pet_state)::text = 'available'::text)
 Total runtime: 1988.962 ms