Re: Can this query go faster???

Ron <rjpeace@xxxxxxxxxxxxx> · Tue, 06 Dec 2005 08:41:11 -0500

At 04:43 AM 12/6/2005, Joost Kraaijeveld wrote:
Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  ->  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
        Sort Key: addresses.zipcode, addresses.housenumber
        ->  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
              Hash Cond: ("outer".contactaddress = "inner".objectid)
              ->  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
              ->  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
                    ->  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.

TIA
customer names, customers.objectid, addresses, and addresses.objectid 
should all be static (addresses do not change, just the customers 
associated with them; and once a customer has been assigned an id 
that better never change...).

To me, this sounds like the addresses and customers tables should be 
duplicated and then physically laid out in sorted order by 
<tablename>.objectid in one set and by the "human friendly" 
associated string in the other set.
Then a finding a specific <tablename>.objectid or it's associated 
string can be done in at worse O(lgn) time assuming binary search 
instead of O(n) time for a sequential scan.  If pg is clever enough, 
it might be able to do better than that.

IOW, I'd try duplicating the addresses and customers tables and using 
the appropriate CLUSTERed Index on each.

I know this breaks Normal Form.  OTOH, this kind of thing is common 
practice for data mining problems on static or almost static data.

Hope this is helpful,
Ron