Re: Index on a range array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.08.2013 23:47, Daniel Cristian Cruz wrote:
Hello,

I'm trying to simplify a schema, where I had many ranges floating around.
My idea is to put them all in an array field and query like this:

SELECT
  event.*
FROM event
JOIN participant_details
  USING (participant_id)
WHERE
  tsrange(event.start, event.end)&&  ANY (participant_details.periods);

periods is tsrange[].

I've tryed and it worked, but without indexes. I've tried something, but
didn't found anything... Does someone know how to index this kind of field
(tsrange[])?

 From the docs I learn that there is some GIST magic, but I would need to
code in C. Is that true?

Yeah. It might be somewhat tricky to write an efficient GIST implementation for this anyway. What you'd really want to do is to index each value in the array separately, which is more like what GIN does. With the "partial match" infrastructure in GIN, it might be possible to write a GIN implementation that can speed up range overlap queries. However, that certainly requires C coding too.

A couple of alternatives come to mind:

You could create the index on just the min and max values of the periods, and in the query check for overlap with that. If there typically aren't big gaps between the periods of each participant, that might work well.

Or you could split the range of expected timestamps into discrete steps, for example at one-day granularity. Create a function to convert a range into an array of steps, e.g convert each range into an array of days that the range overlaps with. Create a GIN index on that array, and use it in the query. Something like this:

-- Returns an int representing the day the given timestamp falls into
create function epochday(timestamp) returns int4 as $$
  select extract (epoch from $1)::int4/(24*3600)
$$ language sql immutable;

-- Same for a range. Returns an array of ints representing all the
-- days that the given range overlaps with.
create function epochdays(tsrange) returns integer[]
as $$
select array_agg(g) from generate_series(epochday(lower($1)), epochday(upper($1))) g
$$
language sql immutable;

-- Same for an array of ranges. Returns an array of ints representing -- all the days that overlap with any of the given timestamp ranges
create function epochdays(ranges tsrange[]) returns integer[]
as $$
declare
  r tsrange;
  result integer[];
begin
  foreach r in array ranges loop
result = result || (select array_agg(g) from generate_series(epochday(lower(r)), epochday(upper(r))) g);
  end loop;
  return result;
end;
$$ language plpgsql immutable;

-- Create the index on that:
create index period_days on participant_details using gin (epochdays(periods));

-- Query like this:
SELECT event.* FROM event
JOIN participant_details USING (participant_id)
-- This WHERE-clause is for correctness:
WHERE tsrange(event.start, event.end) &&  ANY (participant_details.periods);
-- and this is to make use of the index:
AND epochdays(tsrange(event.start, event.end)) && epochdays((participant_details.periods));

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux