Re: GiST index performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeb Havinga wrote:
Yeb Havinga wrote:
Matthew Wakeling wrote:
Matthew Wakeling wrote:
A second quite distinct issue is the general performance of GiST indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for btree_gist indexes. I have a similar example with the bioseg GiST index. I have completely reimplemented the same algorithms in Java for algorithm investigation and instrumentation purposes, and it runs about a hundred times faster than in Postgres. I think this is a problem, and I'm willing
to do some investigation to try and solve it.
I have not made any progress on this issue. I think Oleg and Teodor would be better placed working it out. All I can say is that I implemented the exact same indexing algorithm in Java, and it performed 100 times faster than Postgres. Now, Postgres has to do a lot of additional work, like mapping the index onto disc, locking pages, and abstracting to plugin user functions, so I would expect some difference - I'm not sure 100 times is reasonable though. I tried to do some profiling, but couldn't see any one section of code that was taking too much time. Not sure what I can further do.
Looked in the code a bit more - only the index nodes are compressed at index creation, the consistent functions does not compress queries, so not pallocs there. However when running Mathews example from http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php with the gist index, the coverage shows in gistget.c: 1000000 palloc0 's of gistsearchstack at line 152 and 2010982 palloc's also of the gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is that the pain is here. My knowledge of gistget or the other sources in access/gist is zero, but couldn't it be possible to determine the maximum needed size of the stack and then allocate it at once and use a pop/push kind off api?
Waisted some time today on a ghost chase... I though that removing the millions of pallocs would help, so I wrote an alternative of the gistsearchstack-stack to find out that it was not the index scanning itself that caused milltions of pallocs, but the scan being in the inner loop that was called 1000000 times. The actual scanning time was not changed significantly. The actual scanning time in my vm is for btree (actual time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my searchstack alternative caused pages to be scanned twice, returing twice the amount of rows (6 instead of 3 each time). This resulted in a likewise increase of ms (actual time=0.075..0.150). Somewhere I hit something that causes ~= 0.070 ms twice. For a single index scan, 0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big problem, but yeah when calling it a million times...

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux