Re: [GENERAL] Creation of tsearch2 index is very

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 01:27 PM 1/21/2006, Tom Lane wrote:
Ron <rjpeace@xxxxxxxxxxxxx> writes:
> At 07:23 PM 1/20/2006, Tom Lane wrote:
>> Well, we're trying to split an index page that's gotten full into
>> two index pages, preferably with approximately equal numbers of items in
>> each new page (this isn't a hard requirement though).

> Maybe we are over thinking this.  What happens if we do the obvious
> and just make a new page and move the "last" n/2 items on the full
> page to the new page?

Search performance will go to hell in a handbasket :-(.  We have to make
at least some effort to split the page in a way that will allow searches
to visit only one of the two child pages rather than both.

It's certainly true though that finding the furthest pair is not a
necessary component of that.  It's reasonable if you try to visualize
the problem in 2D or 3D, but I'm not sure that that geometric intuition
holds up in such a high-dimensional space as we have here.
After reading the various papers available on GiST and RD trees, I think I have a decent suggestion.

Since RD tree keys contain the keys of their descendents/components in them, they are basically a representation of a depth first search. This can be useful for intra-document searches.

OTOH, inter-document searches are going to be more akin to breadth first searches using RD trees.

Thus my suggestion is that we maintain =two= index structures for text data.

The first contains as many keys and all their descendents as possible. When we can no longer fit a specific complete "path" into a page, we start a new one; trying to keep complete top level to leaf key sets within a page. This will minimize paging during intra-document searches.

The second index keeps siblings within a page and avoids putting parents or children within a page unless the entire depth first search can be kept within the page in addition to the siblings present. This will minimize paging during inter-document searches.

Traditional B-tree ordering methods can be used to define the ordering/placement of pages within each index, which will minimize head seeks to find the correct page to scan.

Since the criteria for putting a key within a page or starting a new page is simple, performance for those tasks should be O(1).

The price is the extra space used for two indexes instead of one, but at first glance that seems well worth it.

Comments?
Ron






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux