Re: Performance Optimization for Dummies 2 - the SQL

"Carlo Stonebanks" <stonec.register@xxxxxxxxxxxx> · Wed, 4 Oct 2006 14:22:48 -0400

> can you do explain analyze on the two select queries on either side of
> the union separatly?  the subquery is correctly written and unlikely
> to be a problem (in fact, good style imo).  so lets have a look at
> both sides of facil query and see where the problem is.

Sorry for the delay, the server was down yesterday and couldn't get 
anything.

I have modified the sub-queries a little, trying to get the index scans to 
fire - all the tables involved here are large enough to benefit from index 
scans over sequential scans. I am mystified as to why PART 1 is giving me:

 "Seq Scan on facility_address fa  (cost=0.00..3014.68 rows=128268 width=12) 
(actual time=0.007..99.033 rows=128268 loops=1)"

which I assume is for the:

"join mdx_core.facility_address as fa on fa.facility_id = f.facility_id"

Then again, I am not sure how to read the EXPLAIN ANALYSE performance 
numbers.

The other part of the UNION (PART 2) I have also modified, I think it's 
working nicely. Let me know if I'm mistaken on thinking that!

The one remaining problem is that the UNION of these two sub-queries has a 
column which is a call to a custom TCL function that does a lexical analysis 
on the results, ranking the result names by their proximity to the imported 
name. his definitely eats up the performance and I hope that my decision to 
call this function on the results of the union (assuming union deletes 
redundent rows) is the correct one.

Thanks!

Carlo

/* PART 1.
  The redundant expression "facility_address_id is NULL" was removed because
  only an OUTER join would have made this meaningful. We use only INNER 
joins in this sub-query
  Both facility_address and address have seq scans, even though there is an 
index for
  facility_address(facility_id( and an index for address( country_code, 
postal_code, address).
  The "like" operator appears to be making things expensive. This is used 
because we have to take
   into account that perhaps the import row is using the 5-number US ZIP, 
not the 9-number USZIP+4
  standard (although this is not the case in this sample).
/*
explain analyse select
      f.facility_id,
      fa.facility_address_id,
      a.address_id,
      f.facility_type_code,
      f.name,
      a.address,
      a.city,
      a.state_code,
      a.postal_code,
      a.country_code
   from
      mdx_core.facility as f
   join mdx_core.facility_address as fa
      on fa.facility_id = f.facility_id
   join mdx_core.address as a
      on a.address_id = fa.address_id
   where
      a.country_code = 'US'
      and a.state_code = 'IL'
      and a.postal_code like '60640-5759'||'%'
   order by facility_id

"Sort  (cost=6392.50..6392.50 rows=1 width=103) (actual 
time=189.133..189.139 rows=12 loops=1)"
"  Sort Key: f.facility_id"
"  ->  Nested Loop  (cost=2732.88..6392.49 rows=1 width=103) (actual 
time=14.006..188.967 rows=12 loops=1)"
"        ->  Hash Join  (cost=2732.88..6388.91 rows=1 width=72) (actual 
time=13.979..188.748 rows=12 loops=1)"
"              Hash Cond: ("outer".address_id = "inner".address_id)"
"              ->  Seq Scan on facility_address fa  (cost=0.00..3014.68 
rows=128268 width=12) (actual time=0.004..98.867 rows=128268 loops=1)"
"              ->  Hash  (cost=2732.88..2732.88 rows=1 width=64) (actual 
time=6.430..6.430 rows=3 loops=1)"
"                    ->  Bitmap Heap Scan on address a  (cost=62.07..2732.88 
rows=1 width=64) (actual time=2.459..6.417 rows=3 loops=1)"
"                          Recheck Cond: ((country_code = 'US'::bpchar) AND 
((state_code)::text = 'IL'::text))"
"                          Filter: ((postal_code)::text ~~ 
'60640-5759%'::text)"
"                          ->  Bitmap Index Scan on 
address_country_state_postal_code_address_idx  (cost=0.00..62.07 rows=3846 
width=0) (actual time=1.813..1.813 rows=3554 loops=1)"
"                                Index Cond: ((country_code = 'US'::bpchar) 
AND ((state_code)::text = 'IL'::text))"
"        ->  Index Scan using facility_pkey on facility f  (cost=0.00..3.56 
rows=1 width=35) (actual time=0.012..0.013 rows=1 loops=12)"
"              Index Cond: ("outer".facility_id = f.facility_id)"
"Total runtime: 189.362 ms"

/* PART 2 - can you see anything that could work faster? */

explain analyse select
      f.facility_id,
      null as facility_address_id,
      null as address_id,
      f.facility_type_code,
      f.name,
      null as address,
      f.default_city as city,
      f.default_state_code as state_code,
      f.default_postal_code as postal_code,
      f.default_country_code as country_code
   from
      mdx_core.facility as f
   left outer join mdx_core.facility_address as fa
      on fa.facility_id = f.facility_id
   where
      fa.facility_address_id is null
      and f.default_country_code = 'US'
      and f.default_state_code = 'IL'
      and '60640-5759' like f.default_postal_code||'%'

"Nested Loop Left Join  (cost=0.00..6042.41 rows=32 width=73) (actual 
time=14.923..14.923 rows=0 loops=1)"
"  Filter: ("inner".facility_address_id IS NULL)"
"  ->  Index Scan using facility_country_state_postal_code_idx on facility f 
(cost=0.00..5914.69 rows=32 width=73) (actual time=10.118..14.773 rows=10 
loops=1)"
"        Index Cond: ((default_country_code = 'US'::bpchar) AND 
(default_state_code = 'IL'::bpchar))"
"        Filter: ('60640-5759'::text ~~ ((default_postal_code)::text || 
'%'::text))"
"  ->  Index Scan using facility_address_facility_idx on facility_address fa 
(cost=0.00..3.97 rows=2 width=8) (actual time=0.009..0.011 rows=1 loops=10)"
"        Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 15.034 ms"