I have this report query that runs daily on a table with several
hundred million rows total using pg 8.1.3 on Debian Linux on hw with
dual opteron processors:
SELECT count(*) FROM webhits
WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
AND date_recorded >= '3/10/2006'::TIMESTAMP
AND date_recorded < '3/11/2006'::TIMESTAMP;
Here is the explain output:
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------
Aggregate (cost=794775.08..794775.09 rows=1 width=0)
-> Bitmap Heap Scan on webhits (cost=315820.45..794771.74
rows=1337 width=0)
Recheck Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND
(status = 200))
-> BitmapAnd (cost=315820.45..315820.45 rows=249152 width=0)
-> Bitmap Index Scan on webhits_date_idx1
(cost=0.00..140407.45 rows=15379741 width=0)
Index Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on webhits_path_idx2
(cost=0.00..175412.76 rows=15343959 width=0)
Index Cond: (((path)::text >=
'/radio/tuner'::character varying) AND ((path)::text <
'/radio/tunes'::character varying))
According to the planner it should take <15 minutes which is typical in
practice. About half the times it runs, however, it never terminates
(even after days) and just spins consuming 99+% of CPU with no disk
activity. This query was never a problem in postgres versions < 8.1.2,
however the data has grown substantially since that time. I notice it
uses the recent in-memory bitmap feature, so I wondered if it was
exposing a bug.
If I restart the postmaster, the query will complete in the expected
time.
-Casey