Index usage on OR queries

Tore Halvorsen <tore.halvorsen@xxxxxxxxx> · Wed, 31 Aug 2011 16:35:30 +0200

Hi,

I'm trying to optimize a query where I have two tables that both have a timestamp column. I want the result where either of the timestamps is after a specified time. In a reduced form, like this:

CREATE TABLE a
(
  id serial NOT NULL PRIMARY KEY,
  time timestamp without time zone NOT NULL DEFAULT now()
);

CREATE INDEX a_time_idx ON a USING btree (time DESC NULLS LAST);

CREATE TABLE b

(
  id serial NOT NULL PRIMARY KEY,
  time timestamp without time zone NOT NULL DEFAULT now()
);

CREATE INDEX b_time_idx ON b USING btree (time DESC NULLS LAST);

--- generate some data
insert into a(time)

select now() - '10 year'::interval * random() from generate_series(1, 1000000, 1);

insert into b(time)
select now() - '10 year'::interval * random() from generate_series(1, 1000000, 1);

-- Using constraint works as expected, and uses the time index.

select * from a join b using(id)
where a.time >= '2011-08-15';

-- ... both ways... 
select * from a join b using(id)

where b.time >= '2011-08-15';

-- However, if I'm trying to do this for both times at once, the time index is not used at all
select * from a join b using(id)
where a.time >= '2011-08-15' OR b.time >= '2011-08-01'

-- This can be optimized by using CTEs
with am as (
  select * from a where time >= '2011-08-15'
)
, bm as (
  select * from b where time >= '2011-08-15'
)
select * from am join bm using(id)

-- end

I'm just wondering why the optimizer does things the way it does - and if the CTE version is the best way to go...

The actual case is slightly more complex and uses more tables - this is mostly a way to find updated data.

-- 
Eld på åren og sol på eng gjer mannen fegen og fjåg. [Jøtul]
<demo> 2011 Tore Halvorsen || +052 0553034554