OVERLAPS is slow

Chris Browne <cbbrowne@xxxxxxx> · Thu, 29 May 2008 12:46:39 -0400

I'm doing some analysis on temporal usages, and was hoping to make use
of OVERLAPS, but it does not appear that it makes use of indices.

Couching this in an example...  I created a table, t1, thus:

metadata=# \d t1
                                     Table "public.t1"
 Column |           Type           |                       Modifiers                       
--------+--------------------------+-------------------------------------------------------
 id     | integer                  | not null default nextval('t1_id_seq'::regclass)
 t1     | timestamp with time zone | not null default now()
 t2     | timestamp with time zone | not null default 'infinity'::timestamp with time zone
 data   | text                     | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)
    "f2" btree (id) WHERE t2 = 'infinity'::timestamp with time zone
    "t1t1" btree (t1)
    "t1t2" btree (t2)

When entries go in, they default to having an effective date range
from now() until 'infinity'.

I then went off and seeded a bunch of data into the table, inserting
values:

for i in `cat /etc/dictionaries-common/words | head 2000`; do
  psql -d metadata -c "insert into t1 (data) values ('$i');"
done

Then, I started doing temporal updates, thus:

for i in `cat /etc/dictionaries-common/words`; do
psql -d metadata -c "insert into t1 (data) values ('$i');update t1 set t2 = now() where t2 = 'infinity' and id in (select id from t1 where t2 = 'infinity' order by random() limit 1);"
done

This terminates many of those entries, and creates a new one that is
effective "to infinity."

After running this for a while, I have a reasonably meaningful amount
of data in the table:

metadata=# select count(*) from t1; select count(*) from t1 where t2 = 'infinity';
 count  
--------
 125310
(1 row)

 count 
-------
  2177
(1 row)

Searching for the "active" items in the table, via a constructed 'overlap':

metadata=# explain analyze  select count(*)  from t1 where t1 <= now() and t2 >= now();
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=98.13..98.14 rows=1 width=0) (actual time=8.104..8.105 rows=1 loops=1)
   ->  Index Scan using t1t2 on t1  (cost=0.00..93.95 rows=1671 width=0) (actual time=0.116..6.374 rows=2177 loops=1)
         Index Cond: (t2 >= now())
         Filter: (t1 <= now())
 Total runtime: 8.193 ms
(5 rows)

Note, that makes use of the index on column t2, and runs nice and
quick.  (And notice that the rows found, 2177, agrees with the earlier
count.)

Unfortunately, when I try using OVERLAPS, it reverts to a Seq Scan.

metadata=# explain analyze  select *  from t1 where (t1,t2) overlaps (now(), now());
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..3156.59 rows=43135 width=24) (actual time=171.248..205.941 rows=2177 loops=1)
   Filter: "overlaps"(t1, t2, now(), now())
 Total runtime: 207.508 ms
(3 rows)

I would surely think that I have enough data in the table for the
stats to be good, and the first query certainly does harness the index
on t2 to determine if records are overlapping (now(),now()).

Is it possible that we need to have some improvement to the optimizer
so that OVERLAPS could make use of the indices?
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxfinances.info/info/lsf.html
"Very little is known about the War of 1812 because the Americans lost
it."  -- Eric Nicol