Hello. This is my first post. As such, feedback on style and choice of venue are especially welcome.
I am a regular but not especially expert user of a variety of databases, including postgreSQL.
I have only modest experience with spatial databases.
I have a new project[1] in which GiST could be very useful, provided I can achieve high concurrency. Starting with some empirical evidence that R* would be a good place to start, and after reading "High-Concurrency Locking in R-Trees" [2], I went looking for an implementation of R-link trees extended to R*. So I was very interested to read Hellerstein et al. where they wrote [3]:
High concurrency, recoverability, and degree-3 consis-
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
Since this information may be somewhat dated, and GiST has obviously come a long way in postgreSQL, I am looking for current information and advice on the state of concurrency in GiST in postgreSQL. If someone has already done an R*-link tree then that could really help me. ( I can wish, no?)
Thanks for reading and thanks for advice or pointers.
Carlos
[1] It's not a GiS prject, but it has some similarities:
(a) I need to manage up to 10 million three-dimensional "boxes" or as few as 1000 "boxes"
(b) The distribution of sizes, aspect ratios and locations in R3 are all unknown a priori and may change during execution under insert/delete.
(c) Queries may arrive asynchronously and at high rate from hundreds (or more?) of compute nodes.
(d) Successive queries from any node, viewed as a time-sequence, may have very low (or at best sporadic) spatial correlation -- lots of page jumps.
(e) R* will be advantageous over R, but Priority R is probably not especially useful since turnover may be greater than 20% during a "job."
(f) I would like to avoid teh complications of distributed databases, again because of the high turnover.
[2] Marcel Kornacker and Douglas Banks. High-Concurrency Locking in R-Trees. (1995)
[3] Hellerstein, Naughton, and Pfeffer. Generalized Search Trees for Database Systems. (1995)