Was this corrected? I don't see any commits to seg.c. --------------------------------------------------------------------------- Matthew Wakeling wrote: > On Thu, 7 May 2009, Oleg Bartunov wrote: > > Did you try Guttman quadratic split algorithm ? We also found linear > > split algorithm for Rtree. > > The existing (bugfixed) seg split algorithm is the Guttman quadratic split > algorithm. Guttman did all his work on two-dimensional and above data, > dismissing one-dimensional data as being handled adequately by B-trees, > which is not true for segment overlaps. It turns out that the algorithm > has a weakness with certain types of data, and one-dimensional data is > almost certain to exercise that weakness. The greater the number of > dimensions, the less the weakness is exercised. > > The problem is that the algorithm does not calculate a split pivot. > Instead it finds two suitable entries, and adds the remaining entries to > those two in turn. This can lead to the majority of the entries being > added to just one side. In fact, I saw lots of cases where 367 entries > were being split into two pages of 366 and one entry. > > Guttman's linear split algorithm has the same weakness. > > >> One thing I am seeing is a really big difference in performance between > >> Postgres/GiST and a Java implementation I have written, using the same > >> algorithms. Postgres takes three minutes to perform a set of index lookups > >> while java takes six seconds. The old version of bioseg took an hour. I > >> can't see anything in the GiST support code that could account for this. > > > > is the number of index lookups different, or just index lookup time is very > > big ? > > Same number of index lookups. Same algorithms. I have a set of 681879 > segments, and I load them all into the index. I then query the index for > overlaps for each one in turn. For some reason, GiST lookups seem to be > slow, even if they are using a good algorithm. I have seen that problem > with btree_gist on integers too. I can't see any reason for this is the > GiST code - it all seems pretty tight to me. We probably need to do some > profiling. > > Matthew > > -- > I suppose some of you have done a Continuous Maths course. Yes? Continuous > Maths? <menacing stares from audience> Whoah, it was like that, was it! > -- Computer Science Lecturer > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Bruce Momjian <bruce@xxxxxxxxxx> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance