Re: Simple Join

Kevin Brown <blargity@xxxxxxxxx> · Wed, 14 Dec 2005 17:52:45 -0600

On Wednesday 14 December 2005 17:30, Mark Kirkwood wrote:
> You scan 600000 rows from to_ship to get about 25000 - so some way to
> cut this down would help.

Yup.  I'm open to anything too, as this is the only real part of the system 
that cares.  So either maintaining a denormalized copy column, or whatever 
would be fine.  We're doing far more reads than writes.

> Try out an explicit INNER JOIN which includes the filter info for paid
> and suspended_sub in the join condition (you may need indexes on each of
> id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap
> scan):

I only had two explicit indexes.  One was on to_ship.ordered_product_id and 
the other was on ordered_products.paid.  ordered_products.id is a primary 
key.  This is on your query with an index added on suspended_sub:

dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship INNER JOIN ordered_products
dli-# ON (to_ship.ordered_product_id = ordered_products.id
dli(#      AND ordered_products.paid = TRUE      AND 
dli(# ordered_products.suspended_sub = FALSE);
                                                               QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=5126.19..31528.40 rows=20591 width=8) (actual 
time=4554.190..23519.618 rows=14367 loops=1)
   Hash Cond: ("outer".ordered_product_id = "inner".id)
   ->  Seq Scan on to_ship  (cost=0.00..11529.12 rows=611612 width=8) (actual 
time=11.254..15192.042 rows=611612 loops=1)
   ->  Hash  (cost=4954.79..4954.79 rows=21759 width=16) (actual 
time=4494.900..4494.900 rows=18042 loops=1)
         ->  Index Scan using paid_index on ordered_products  
(cost=0.00..4954.79 rows=21759 width=16) (actual time=72.431..4414.697 
rows=18042 loops=1)
               Index Cond: (paid = true)
               Filter: (paid AND (NOT suspended_sub))
 Total runtime: 23532.785 ms
(8 rows)

So what's the best way to performance wiggle this info out of the db?  The 
list of values is only about 30 tuples long out of this query, so I was 
figuring I could trigger on insert to to_ship to place the value into another 
table if it didn't already exist.  I'd rather the writing be slow than the 
reading.