Re: Index on a range array

Heikki Linnakangas <hlinnakangas@xxxxxxxxxx> · Thu, 15 Aug 2013 10:30:55 +0300

On 13.08.2013 23:47, Daniel Cristian Cruz wrote:
Hello,

I'm trying to simplify a schema, where I had many ranges floating around.
My idea is to put them all in an array field and query like this:

SELECT
  event.*
FROM event
JOIN participant_details
  USING (participant_id)
WHERE
  tsrange(event.start, event.end)&&  ANY (participant_details.periods);

periods is tsrange[].

I've tryed and it worked, but without indexes. I've tried something, but
didn't found anything... Does someone know how to index this kind of field
(tsrange[])?

 From the docs I learn that there is some GIST magic, but I would need to
code in C. Is that true?

Yeah. It might be somewhat tricky to write an efficient GIST 
implementation for this anyway. What you'd really want to do is to index 
each value in the array separately, which is more like what GIN does. 
With the "partial match" infrastructure in GIN, it might be possible to 
write a GIN implementation that can speed up range overlap queries. 
However, that certainly requires C coding too.

A couple of alternatives come to mind:

You could create the index on just the min and max values of the 
periods, and in the query check for overlap with that. If there 
typically aren't big gaps between the periods of each participant, that 
might work well.

Or you could split the range of expected timestamps into discrete steps, 
for example at one-day granularity. Create a function to convert a range 
into an array of steps, e.g convert each range into an array of days 
that the range overlaps with. Create a GIN index on that array, and use 
it in the query. Something like this:

-- Returns an int representing the day the given timestamp falls into
create function epochday(timestamp) returns int4 as $$
  select extract (epoch from $1)::int4/(24*3600)
$$ language sql immutable;

-- Same for a range. Returns an array of ints representing all the
-- days that the given range overlaps with.
create function epochdays(tsrange) returns integer[]
as $$
  select array_agg(g) from generate_series(epochday(lower($1)), 
epochday(upper($1))) g
$$
language sql immutable;

-- Same for an array of ranges. Returns an array of ints representing -- 
all the days that overlap with any of the given timestamp ranges
create function epochdays(ranges tsrange[]) returns integer[]
as $$
declare
  r tsrange;
  result integer[];
begin
  foreach r in array ranges loop
    result = result || (select array_agg(g) from 
generate_series(epochday(lower(r)), epochday(upper(r))) g);
  end loop;
  return result;
end;
$$ language plpgsql immutable;

-- Create the index on that:
create index period_days on participant_details using gin 
(epochdays(periods));

-- Query like this:
SELECT event.* FROM event
JOIN participant_details USING (participant_id)
-- This WHERE-clause is for correctness:
WHERE tsrange(event.start, event.end) &&  ANY (participant_details.periods);
-- and this is to make use of the index:
AND epochdays(tsrange(event.start, event.end)) && 
epochdays((participant_details.periods));

- Heikki

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance