Yeb Havinga wrote:
Yeb Havinga wrote:
Matthew Wakeling wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST
indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
for
btree_gist indexes. I have a similar example with the bioseg GiST
index. I
have completely reimplemented the same algorithms in Java for
algorithm
investigation and instrumentation purposes, and it runs about a
hundred
times faster than in Postgres. I think this is a problem, and I'm
willing
to do some investigation to try and solve it.
I have not made any progress on this issue. I think Oleg and Teodor
would be better placed working it out. All I can say is that I
implemented the exact same indexing algorithm in Java, and it
performed 100 times faster than Postgres. Now, Postgres has to do a
lot of additional work, like mapping the index onto disc, locking
pages, and abstracting to plugin user functions, so I would expect
some difference - I'm not sure 100 times is reasonable though. I
tried to do some profiling, but couldn't see any one section of code
that was taking too much time. Not sure what I can further do.
Looked in the code a bit more - only the index nodes are compressed at
index creation, the consistent functions does not compress queries, so
not pallocs there. However when running Mathews example from
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
with the gist index, the coverage shows in gistget.c: 1000000 palloc0
's of gistsearchstack at line 152 and 2010982 palloc's also of the
gistsearchstack on line 342. Two pfrees are also hit a lot: line 195:
1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is
that the pain is here. My knowledge of gistget or the other sources in
access/gist is zero, but couldn't it be possible to determine the
maximum needed size of the stack and then allocate it at once and use
a pop/push kind off api?
Waisted some time today on a ghost chase... I though that removing the
millions of pallocs would help, so I wrote an alternative of the
gistsearchstack-stack to find out that it was not the index scanning
itself that caused milltions of pallocs, but the scan being in the inner
loop that was called 1000000 times. The actual scanning time was not
changed significantly.
The actual scanning time in my vm is for btree (actual
time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my
searchstack alternative caused pages to be scanned twice, returing twice
the amount of rows (6 instead of 3 each time). This resulted in a
likewise increase of ms (actual time=0.075..0.150). Somewhere I hit
something that causes ~= 0.070 ms twice. For a single index scan,
0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big
problem, but yeah when calling it a million times...
regards,
Yeb Havinga
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance