Search Postgresql Archives

Re: Creation of tsearch2 index is very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 20, 2006 at 10:35:21AM -0500, Tom Lane wrote:
> However, I'm not sure that anyone's tried to do any performance
> optimization on the GIST insert code ... there might be some low-hanging
> fruit there.  It'd be interesting to look at a gprof profile of what the
> backend is doing during the index build.  Do you have the ability to do
> that, or would you be willing to give your data to someone else to
> investigate with?  (The behavior is very possibly data-dependent, which
> is why I want to see a profile with your specific data and not just some
> random dataset or other.)

The cost on inserting would generally go to either penalty, or
picksplit. Certainly if you're inserting lots of values in a short
interval, I can imagine picksplit being nasty, since the algorithms for
a lot of datatypes are not really reknown for their speed.

I'm wondering if you could possibly improve the process by grouping
into larger blocks. For example, pull out enough tuples to cover 4
pages and then call picksplit three times to split it into the four
pages. This gives you 4 entries for the level above the leaves. Keep
reading tuples and splitting until you get enough for the next level
and call picksplit on those. etc etc.

The thing is, you never call penalty here so it's questionable whether
the tree will be as efficient as just inserting. For example, if have a
data type representing ranges (a,b), straight inserting can produce the
perfect tree order like a b-tree (assuming non-overlapping entries).
The above process will produce something close, but not quite...

Should probably get out a pen-and-paper to model this. After all, if
the speed of the picksplit increases superlinearly to the number of
elements, calling it will larger sets may prove to be a loss overall...

Perhaps the easiest would be to allow datatypes to provide a bulkinsert
function, like b-tree does? The question is, what should be its input
and output?

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment: signature.asc
Description: Digital signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux