Yeb Havinga wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST
indexes
which is also mentioned in the old thread linked from Open Items.
For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
for
btree_gist indexes. I have a similar example with the bioseg GiST
index. I
have completely reimplemented the same algorithms in Java for
algorithm
investigation and instrumentation purposes, and it runs about a
hundred
times faster than in Postgres. I think this is a problem, and I'm
willing
to do some investigation to try and solve it.
More gist performance..
Some background: I've been involved in developing several datatypes that
make use of gist indexes (we just published a paper on it, see
http://arxiv.org/abs/1003.3370), that's the main reason I'm very much
interested in possible improvements in gist index speed. One of the
datatypes was 'very badly' indexable: non leaf pages were getting very
general keys, so in the end we could see from the scanning times
compared to sequential scans that the whole index was being scanned. One
thing I remember was the idea that somehow it would be nice if the dept
of the gist tree could be fiddled with: in that case keys of non leaf
nodes would be more specific again. In the original Guttman R-tree paper
there was mention of a parameter that determined the size of entries in
nodes and thereby indirectly the depth of the tree. I missed that in the
PostgreSQL gist api.
One thing Gist scanning does very often is key comparisons. So another
approach is to try to limit those and again this might be possible by
increasing the depth / decrease number of entries per page. I just did a
test where in gistfitpage the gistpagesize was divided by 5 and
something similar in gistnospace.
Scantime before adjustment: about 70 seconds.
After adjustment: 45 seconds.
With gist_stat from the gevel package confirmed that the depth was now 4
(original 3). Drawback: bigger index size because pages are not filled
completely anymore.
The explain shows (actual time=0.030..0.032) for the inner loop times,
which is less then half of the original scan time.
Since the gistpagesize is derived from the database blocksize, it might
be wise to set the blocksize low for this case, I'm going to play with
this a bit more.
regards,
Yeb Havinga
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance