Index usage without enable_seqscan

Shane <shane-keyword-pgsql.a1e0d9@xxxxx> · Mon, 30 May 2005 09:04:45 -0700

Hello list,

I have a query which is running faster with set
enable_seqscan=off and am wondering why this would be.  The
data is a web traffic database with the following relevant
tables.

                                   Table "public.urlbase"
  Column  |          Type          |                        Modifiers           
----------+------------------------+---------------------------------------------------------
 id       | integer                | not null default nextval('public.urlbase_id_seq'::text)
 protocol | character varying(32)  | not null default 'http://'::character varying
 host     | character varying(256) | not null
Indexes:
    "urlbase_pkey" primary key, btree (id)
    "urlbase_protocol_key" unique, btree (protocol, host)

                             Table "public.urls"
   Column   |  Type   |                      Modifiers
------------+---------+------------------------------------------------------
 id         | integer | not null default nextval('public.urls_id_seq'::text)
 urlbase_id | integer | not null
 path       | text    | not null
Indexes:
    "urls_pkey" primary key, btree (id)
    "urls_base_id_key" unique, btree (urlbase_id, "path")
Foreign-key constraints:
    "$1" FOREIGN KEY (urlbase_id) REFERENCES urlbase(id) ON UPDATE CASCADE ON DELETE CASCADE

                                     Table "public.hits"
   Column   |          Type          |                       Modifiers          
------------+------------------------+--------------------------------------------------------
 ip         | inet                   | not null
 url_id     | integer                | not null
 javascript | boolean                | not null default false
 browser_id | integer                |
 hit_date   | date                   | not null default ('now'::text)::date
 hit_time   | time(0) with time zone | not null default ('now'::text)::time(6) with time zone
Indexes:
    "hits_hit_date" btree (hit_date)
    "url_id" btree (url_id)
Foreign-key constraints:
    "$3" FOREIGN KEY (browser_id) REFERENCES browsers(id) ON UPDATE CASCADE ON DELETE SET NULL
    "$1" FOREIGN KEY (url_id) REFERENCES urls(id) ON UPDATE CASCADE ON DELETE CASCADE

The query is a basic stats ququestion, how many visits did
a particular site receive grouped by the date.  With
enable_seqscan=yes, the query takes quite a bit longer than
with enable_seqscan=off.  From 84 sec to 50 sec on larger
sites and from 15 sec to 5 sec on smaller sites.

set enable_seqscan=yes
explain select hit_date,count(*) as total
from hits inner join urls on url_id=urls.id
inner join urlbase on urlbase_id=urlbase.id
where protocol='http://' and host='www.cm.nu'
and hit_date >= '2005-01-01'
group by hit_date
order by hit_date

                                                           QUERY PLAN           
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=204398.73..204399.45 rows=287 width=4)
   Sort Key: hits.hit_date
   ->  HashAggregate  (cost=204386.29..204387.01 rows=287 width=4)
         ->  Hash Join  (cost=26976.68..204328.54 rows=11550 width=4)
               Hash Cond: ("outer".url_id = "inner".id)
               ->  Seq Scan on hits  (cost=0.00..165226.88 rows=2401897 width=8)
                     Filter: (hit_date >= '2005-01-01'::date)
               ->  Hash  (cost=26966.00..26966.00 rows=4272 width=4)
                     ->  Hash Join  (cost=4.68..26966.00 rows=4272 width=4)
                           Hash Cond: ("outer".urlbase_id = "inner".id)
                           ->  Seq Scan on urls  (cost=0.00..22476.40 rows=888440 width=8)
                           ->  Hash  (cost=4.68..4.68 rows=1 width=4)
                                 ->  Index Scan using urlbase_protocol_key on urlbase  (cost=0.00..4.68 rows=1 width=4)
                                       Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text = 'www.cm.nu'::text))
(14 rows)

set enable_seqscan=off;
                                                     QUERY PLAN                 
---------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6578335.31..6578336.03 rows=287 width=4)
   Sort Key: hits.hit_date
   ->  HashAggregate  (cost=6578322.87..6578323.59 rows=287 width=4)
         ->  Nested Loop  (cost=0.00..6578265.12 rows=11550 width=4)
               ->  Nested Loop  (cost=0.00..183390.60 rows=4272 width=4)
                     ->  Index Scan using urlbase_protocol_key on urlbase  (cost=0.00..4.68 rows=1 width=4)
                           Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text = 'www.cm.nu'::text))
                     ->  Index Scan using urls_base_id_key on urls  (cost=0.00..182768.94 rows=49358 width=8)
                           Index Cond: (urls.urlbase_id = "outer".id)
               ->  Index Scan using url_id on hits  (cost=0.00..1495.23 rows=136 width=8)
                     Index Cond: (hits.url_id = "outer".id)
                     Filter: (hit_date >= '2005-01-01'::date)
(12 rows)

Any assistance would be appreicated.  It's not an important
query, that is it isn't executed very often but I am
curious why this would be happening and what I am doing
wrong.

Best regards,
Shane

-- 
Shane Wegner
http://www.cm.nu/~shane/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match