Re: GiST index performance

Yeb Havinga <yebhavinga@xxxxxxxxx> · Wed, 17 Mar 2010 10:26:14 +0100

Yeb Havinga wrote:
Matthew Wakeling wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST 
indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php 
for
btree_gist indexes. I have a similar example with the bioseg GiST 
index. I
have completely reimplemented the same algorithms in Java for 
algorithm
investigation and instrumentation purposes, and it runs about a 
hundred
times faster than in Postgres. I think this is a problem, and I'm 
willing
to do some investigation to try and solve it.
I have not made any progress on this issue. I think Oleg and Teodor 
would be better placed working it out. All I can say is that I 
implemented the exact same indexing algorithm in Java, and it 
performed 100 times faster than Postgres. Now, Postgres has to do a 
lot of additional work, like mapping the index onto disc, locking 
pages, and abstracting to plugin user functions, so I would expect 
some difference - I'm not sure 100 times is reasonable though. I 
tried to do some profiling, but couldn't see any one section of code 
that was taking too much time. Not sure what I can further do.
Hello Mathew and list,

A lot of time spent in gistget.c code and a lot of functioncall5's to 
the gist's consistent function which is out of sight for gprof.
Something different but related since also gist: we noticed before 
that gist indexes that use a compressed form for index entries suffer 
from repeated compress calls on query operands (see 
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php).

The btree_gist int4 compress function calls the generic 
gbt_num_compress, which does a palloc. Maybe this palloc is allso hit 
al lot when scanning the index, because the constants that are queries 
with are repeatedly compressed and palloced.
Looked in the code a bit more - only the index nodes are compressed at 
index creation, the consistent functions does not compress queries, so 
not pallocs there. However when running Mathews example from 
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php 
with the gist index, the coverage shows in gistget.c: 1000000 palloc0 's 
of gistsearchstack at line 152 and 2010982 palloc's also of the 
gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: 
1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is 
that the pain is here. My knowledge of gistget or the other sources in 
access/gist is zero, but couldn't it be possible to determine the 
maximum needed size of the stack and then allocate it at once and use a 
pop/push kind off api?

regards,
Yeb Havinga

regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance