Re: GiST index performance

Yeb Havinga <yebhavinga@xxxxxxxxx> · Fri, 19 Mar 2010 17:31:02 +0100

Yeb Havinga wrote:
Yeb Havinga wrote:
Matthew Wakeling wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST 
indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php 
for
btree_gist indexes. I have a similar example with the bioseg GiST 
index. I
have completely reimplemented the same algorithms in Java for 
algorithm
investigation and instrumentation purposes, and it runs about a 
hundred
times faster than in Postgres. I think this is a problem, and I'm 
willing
to do some investigation to try and solve it.
I have not made any progress on this issue. I think Oleg and Teodor 
would be better placed working it out. All I can say is that I 
implemented the exact same indexing algorithm in Java, and it 
performed 100 times faster than Postgres. Now, Postgres has to do a 
lot of additional work, like mapping the index onto disc, locking 
pages, and abstracting to plugin user functions, so I would expect 
some difference - I'm not sure 100 times is reasonable though. I 
tried to do some profiling, but couldn't see any one section of code 
that was taking too much time. Not sure what I can further do.
Looked in the code a bit more - only the index nodes are compressed at 
index creation, the consistent functions does not compress queries, so 
not pallocs there. However when running Mathews example from 
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php 
with the gist index, the coverage shows in gistget.c: 1000000 palloc0 
's of gistsearchstack at line 152 and 2010982 palloc's also of the 
gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: 
1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is 
that the pain is here. My knowledge of gistget or the other sources in 
access/gist is zero, but couldn't it be possible to determine the 
maximum needed size of the stack and then allocate it at once and use 
a pop/push kind off api?
Waisted some time today on a ghost chase... I though that removing the 
millions of pallocs would help, so I wrote an alternative of the 
gistsearchstack-stack to find out that it was not the index scanning 
itself that caused milltions of pallocs, but the scan being in the inner 
loop that was called 1000000 times. The actual scanning time was not 
changed significantly.
The actual scanning time in my vm is for btree (actual 
time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my 
searchstack alternative caused pages to be scanned twice, returing twice 
the amount of rows (6 instead of 3 each time). This resulted in a 
likewise increase of ms (actual time=0.075..0.150). Somewhere I hit 
something that causes ~= 0.070 ms twice. For a single index scan, 
0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big 
problem, but yeah when calling it a million times...

regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance