This is a good one :)
Here is a "brief" description of our issue(Postgres 9.0):
Tables:
main fact table:
Table "public.parent_fact"
Column | Type |
----------------------+-----------------------------+-----------
etime | date | not null
pcamp_id | integer |
location_id | integer |
impressions | bigint |
clicks | int
this table partitioned by etime.
We are trying to build a report, which has last week numbers alongside with this week numbers. For example: if today is Wednesday, I want to compare daily numbers from last week 3 days (mon through wed) with this week 3 days(mon through wed).
To accomplish that, we've decided to build a transformation table, which has two columns:
Table "public.trans_last_week"
Column | Type | Modifiers
----------+-----------------------------+-----------
etime | date |
lw_etime | date |
So for each date(etime), we have lw_etime, which is essentially etime-7 days.
Here is the first query, which performs fine:
select a11.location_id AS location_id,
a11.pcamp_id AS pcamp_id,
sum(a11.clicks)
from parent_fact a11
where a11.etime between '2011-14-18' and '2011-04-20'
group by a11.location_id,
a11.pcamp_id
everything is good there - it calculates numbers from the current week and goes to only 3 partitions to aggregate numbers.
Here is the second query:
select a11.location_id AS location_id,
a11.pcamp_id AS pcamp_id,
sum(a11.clicks)
from parent_fact a11
join trans_last_week a12
on (a11.etime = a12.lw_etime)
where a12.etime between '2011-14-18' and '2011-04-20'
group by a11.location_id,
a11.pcamp_id
Here it scans through all partitions in the parent_fact table and runs 3-4 times slower.
What was noticed, that the only case when Postgres is actually going to execute the query against the right partitions is query #1.
Is that by design? Second query join, will also result in 3 days(3 partitions)
This query (#3) also scans all partitions:
select a11.location_id AS location_id,
a11.pcamp_id AS pcamp_id,
sum(a11.clicks)
from parent_fact a11
where a11.etime in (select a12.etime from trans_last_week a12 where a11.etime = a12.lw_etime)
group by a11.location_id,
a11.pcamp_id
Thank you!