I have a performance problem using a dimensional model where the date is specified in a DATE dimension, specifically when using 'WHERE DATE >= 'Some Date'
This query runs very fast when using an equality _expression_, eg. 'WHERE DATE = '2014-01-01", and I'm wondering if there is a way to make it run fast when using the greater than _expression_.
The dimension table is about 5k rows, and the Fact table is ~60M.
Thanks in advance for any advice.
JT.
The query :
select sid, count(*) from fact fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id where dim.date >= '2014-1-25' group by sid order by count desc limit 10;
Table "public.fact_data"
Column | Type | Modifiers
---------------+-----------------------------+-----------
date_id | integer |
date | timestamp without time zone |
agent_id | integer |
instance_id | integer |
sid | integer |
Indexes:
"fact_agent_id" btree (agent_id)
"fact_date_id" btree (date_id) CLUSTER
"fact_alarms_sid" btree (sid)
Table "public.dim_date"
Column | Type | Modifiers
--------------------+---------+------------------------------------------------------------
date_id | integer | not null default nextval('dim_date_date_id_seq'::regclass)
date | date |
year | integer |
month | integer |
month_name | text |
day | integer |
day_of_year | integer |
weekday_name | text |
calendar_week | integer |
quarter | text |
year_quarter | text |
year_month | text |
year_calendar_week | text |
weekend | text |
week_start_date | date |
week_end_date | date |
month_start_date | date |
month_end_date | date |
Indexes:
"dim_date_date" btree (date)
"dim_date_date_id" btree (date_id)
EXPLAIN Output:
explain (analyze, buffers) select dim.date_id, fact.sid, count(1) from fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id where dim.date_id >= 5139 group by 1,2 order by 3 desc limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9772000.55..9772000.58 rows=10 width=8) (actual time=91064.421..91064.440 rows=10 loops=1)
Buffers: shared hit=4042 read=1542501
-> Sort (cost=9772000.55..9787454.06 rows=6181404 width=8) (actual time=91064.408..91064.414 rows=10 loops=1)
Sort Key: (count(1))
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=4042 read=1542501
-> GroupAggregate (cost=9150031.23..9638422.63 rows=6181404 width=8) (actual time=90892.625..91063.905 rows=617 loops=1)
Buffers: shared hit=4042 read=1542501
-> Sort (cost=9150031.23..9256675.57 rows=42657736 width=8) (actual time=90877.129..90964.995 rows=124965 loops=1)
Sort Key: dim.date_id, fact.sid
Sort Method: quicksort Memory: 8930kB
Buffers: shared hit=4042 read=1542501
-> Hash Join (cost=682.34..3160739.50 rows=42657736 width=8) (actual time=45087.394..90761.624 rows=124965 loops=1)
Hash Cond: (fact.date_id = dim.date_id)
Buffers: shared hit=4042 read=1542501
-> Seq Scan on fact_data fact (cost=0.00..2139866.40 rows=59361340 width=8) (actual time=0.090..47001.500 rows=59360952 loops=1)
Buffers: shared hit=3752 read=1542501
-> Hash (cost=518.29..518.29 rows=13124 width=4) (actual time=21.083..21.083 rows=13125 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 462kB
Buffers: shared hit=290
-> Seq Scan on dim_date dim (cost=0.00..518.29 rows=13124 width=4) (actual time=0.494..10.918 rows=13125 loops=1)
Filter: (date_id >= 5139)
Rows Removed by Filter: 5138
Buffers: shared hit=290
Total runtime: 91064.496 ms
(25 rows)