Join vs Subquery

Brian Herlihy <btherl@xxxxxxxxxxxx> · Wed, 2 May 2007 23:07:57 -0700 (PDT)

Hi,

I am using postgres 8.1.3 for this.  If this has been dealt with later, please disregard.  And this is not a complaint or a request, I am just curious, so I know how to best construct my queries.

There is a unique index mapping domains to domain_ids.

views_ts specifies a partitioned table, where views_ts_2007_04_01 is the only partition matching the range given in the query.

My goal is to produce summaries of counts of rows for each day within a given range (can be days, months, years).

The issue: the second query results in a lower cost estimate.  I am wondering why the second query plan was
 not chosen for the first query.

Thanks!
Brian

live=> explain select ts::date,count(*) from views_ts join domains using (domain_id) where domain = '1234.com' and ts >= '2007-04-01' and ts < '2007-04-02' group by ts::date;
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=9040.97..9041.00 rows=2 width=8)
   ->  Hash Join  (cost=6.01..9040.96 rows=2
 width=8)
         Hash Cond: ("outer".domain_id = "inner".domain_id)
         ->  Append  (cost=0.00..7738.01 rows=259383 width=16)
               ->  Seq Scan on views_ts  (cost=0.00..1138.50 rows=1 width=16)
                     Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
               ->  Seq Scan on views_ts_2007_04_01 views_ts  (cost=0.00..6599.51 rows=259382 width=16)
                     Filter: ((ts >=
 '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
         ->  Hash  (cost=6.01..6.01 rows=1 width=8)
               ->  Index Scan using domains_domain on domains  (cost=0.00..6.01 rows=1 width=8)
                     Index Cond: ("domain" = '1234.com'::text)
(11 rows)

live=> explain select ts::date,count(*) from views_ts where domain_id = (select domain_id from domains where domain = '1234.com') and ts >= '2007-04-01' and ts < '2007-04-02' group by
 ts::date;
                                                                                 QUERY PLAN                                                                

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1993.93..1995.99 rows=137 width=8)
   InitPlan
     ->  Index Scan using domains_domain on domains  (cost=0.00..6.01 rows=1 width=8)
           Index Cond: ("domain" = '1234.com'::text)
   ->  Result  (cost=0.00..1986.69 rows=247 width=8)
         ->  Append  (cost=0.00..1986.07 rows=247 width=8)
               ->  Seq Scan on views_ts  (cost=0.00..1245.75 rows=1 width=8)

 Filter: ((domain_id = $0) AND (ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
               ->  Bitmap Heap Scan on views_ts_2007_04_01 views_ts  (cost=2.86..740.32 rows=246 width=8)
                     Recheck Cond: (domain_id = $0)
                     Filter: ((ts >= '2007-04-01 00:00:00+10'::timestamp with time zone) AND (ts < '2007-04-02 00:00:00+10'::timestamp with time zone))
                     ->  Bitmap Index Scan on views_ts_2007_04_01_domain_id  (cost=0.00..2.86 rows=246
 width=0)
                           Index Cond: (domain_id = $0)