The application need is to determine genomic features present in a user-defined portion of a chromosome. My guess is that features (boxes) are overlapping along a line (chromosome), and there is a need to represent them as being stacked. Since I'm not certain of its exact use, I've emailed the application owner to find the motivation as to why a geometric index structure is used, and why the boxes are tall and overlapping. As a side note, the data model for our application is based on a popular bioinformatics open source project called chado. Thanks, Tom -----Original Message----- From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] Sent: Friday, June 29, 2007 2:38 PM To: Dolafi, Tom Cc: pgsql-performance@xxxxxxxxxxxxxx; Oleg Bartunov; Teodor Sigaev Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3 "Dolafi, Tom" <dolafit@xxxxxxxxxxxxxxxx> writes: > In the mean time I've dropped the index which has resulted in overall > performance gain on queries against the table, but we have not tested > the part of the application which would utilize this index. I noted that with the same (guessed-at) distribution of fmin/fmax, the index size remains reasonable if you change the derived boxes to CREATE OR REPLACE FUNCTION boxrange(integer, integer) RETURNS box AS 'SELECT box (point($1, $1), point($2, $2))' LANGUAGE 'sql' STRICT IMMUTABLE; which makes sense from the point of view of geometric intuition: instead of a bunch of very tall, mostly very narrow, mostly overlapping boxes, you have a bunch of small square boxes spread out along a line. So it stands to reason that a geometrically-motivated index structure would work a lot better on the latter. I don't know though whether your queries can be adapted to work with this. What was the index being used for, exactly? regards, tom lane