Re: GiST index performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeb Havinga wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for btree_gist indexes. I have a similar example with the bioseg GiST index. I have completely reimplemented the same algorithms in Java for algorithm investigation and instrumentation purposes, and it runs about a hundred times faster than in Postgres. I think this is a problem, and I'm willing
to do some investigation to try and solve it.
More gist performance..

Some background: I've been involved in developing several datatypes that make use of gist indexes (we just published a paper on it, see http://arxiv.org/abs/1003.3370), that's the main reason I'm very much interested in possible improvements in gist index speed. One of the datatypes was 'very badly' indexable: non leaf pages were getting very general keys, so in the end we could see from the scanning times compared to sequential scans that the whole index was being scanned. One thing I remember was the idea that somehow it would be nice if the dept of the gist tree could be fiddled with: in that case keys of non leaf nodes would be more specific again. In the original Guttman R-tree paper there was mention of a parameter that determined the size of entries in nodes and thereby indirectly the depth of the tree. I missed that in the PostgreSQL gist api.

One thing Gist scanning does very often is key comparisons. So another approach is to try to limit those and again this might be possible by increasing the depth / decrease number of entries per page. I just did a test where in gistfitpage the gistpagesize was divided by 5 and something similar in gistnospace.

Scantime before adjustment: about 70 seconds.
After adjustment: 45 seconds.

With gist_stat from the gevel package confirmed that the depth was now 4 (original 3). Drawback: bigger index size because pages are not filled completely anymore.

The explain shows (actual time=0.030..0.032) for the inner loop times, which is less then half of the original scan time.

Since the gistpagesize is derived from the database blocksize, it might be wise to set the blocksize low for this case, I'm going to play with this a bit more.

regards,
Yeb Havinga



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux