High-Concurrency GiST in postgreSQL

"C. Mundi" <cmundi@xxxxxxxxx> · Mon, 5 Dec 2011 11:31:09 -0700

Hello.  This is my first post.  As such, feedback on style and choice of venue are especially welcome.

I am a regular but not especially expert user of a variety of databases, including postgreSQL.
I have only modest experience with spatial databases.

I have a new project[1] in which GiST could be very useful, provided I can achieve high concurrency.  Starting with some empirical evidence that R* would be a good place to start, and after reading "High-Concurrency Locking in R-Trees" [2],  I went looking for an implementation of R-link trees extended to R*.  So I was very interested to read Hellerstein et al. where they wrote [3]:

High concurrency, recoverability, and degree-3 consis-
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-

tation of GiSTs.

Since this information may be somewhat dated, and GiST has obviously come a long way in postgreSQL, I am looking for current information and advice on the state of concurrency in GiST in postgreSQL.  If someone has already done an R*-link tree then that could really help me.  ( I can wish, no?)

Thanks for reading and thanks for advice or pointers.

Carlos

[1] It's not a GiS prject, but it has some similarities:
(a) I need to manage up to 10 million three-dimensional "boxes" or as few as 1000 "boxes" 

(b) The distribution of sizes, aspect ratios and locations in R3 are all unknown a priori and may change during execution under insert/delete.  
(c) Queries may arrive asynchronously and at high rate from hundreds (or more?) of compute nodes.

(d) Successive queries from any node, viewed as a time-sequence, may have very low (or at best sporadic) spatial correlation -- lots of page jumps.
(e) R* will be advantageous over R, but Priority R is probably not especially useful since turnover may be greater than 20% during a "job."

(f) I would like to avoid teh complications of distributed databases, again because of the high turnover.

[2] Marcel Kornacker and Douglas Banks. High-Concurrency Locking in R-Trees. (1995)

[3] Hellerstein, Naughton, and Pfeffer. Generalized Search Trees for Database Systems. (1995)