At 07:23 PM 1/20/2006, Tom Lane wrote:
"Steinar H. Gunderson" <sgunderson@xxxxxxxxxxx> writes:
> On Fri, Jan 20, 2006 at 06:52:37PM -0500, Tom Lane wrote:
>> It's also worth considering that the entire approach is a heuristic,
>> really --- getting the furthest-apart pair of seeds doesn't guarantee
>> an optimal split as far as I can see. Maybe there's some totally
>> different way to do it.
> For those of us who don't know what tsearch2/gist is trying to accomplish
> here, could you provide some pointers? :-)
Well, we're trying to split an index page that's gotten full into
two index pages, preferably with approximately equal numbers of items in
each new page (this isn't a hard requirement though).
Maybe we are over thinking this. What happens if we do the obvious
and just make a new page and move the "last" n/2 items on the full
page to the new page?
Various forms of "move the last n/2 items" can be tested here:
0= just split the table in half. Sometimes KISS works. O(1).
1= the one's with the highest (or lowest) "x" value.
2= the one's with the highest sum of coordinates (x+y+...= values in
the top/bottom n/2 of entries).
3= split the table so that each table has entries whose size_waste
values add up to approximately the same value.
4= I'm sure there are others.
1-5 can be done in O(n) time w/o auxiliary data. They can be done in
O(1) if we've kept track of the appropriate metric as we've built the
current page.
I think the true figure of merit for a split is how often will
subsequent searches have to descend into *both* of the resulting
pages as opposed to just one
--- the less often that is true, the better. I'm not very clear on
what tsearch2 is doing with these bitmaps, but it looks like an
upper page's downlink has the union (bitwise OR) of the one-bits in
the values on the lower page, and you have to visit the lower page
if this union has a nonempty intersection with the set you are
looking for. If that's correct, what you really want is to divide
the values so that the unions of the two sets have minimal overlap
... which seems to me to have little to do with what the code does at present.
I'm not sure what "upper page" and "lower page" mean here?
Teodor, Oleg, can you clarify what's needed here?
Ditto. Guys what is the real motivation and purpose for this code?
Ron