On Fri, Jan 20, 2006 at 10:35:21AM -0500, Tom Lane wrote: > However, I'm not sure that anyone's tried to do any performance > optimization on the GIST insert code ... there might be some low-hanging > fruit there. It'd be interesting to look at a gprof profile of what the > backend is doing during the index build. Do you have the ability to do > that, or would you be willing to give your data to someone else to > investigate with? (The behavior is very possibly data-dependent, which > is why I want to see a profile with your specific data and not just some > random dataset or other.) The cost on inserting would generally go to either penalty, or picksplit. Certainly if you're inserting lots of values in a short interval, I can imagine picksplit being nasty, since the algorithms for a lot of datatypes are not really reknown for their speed. I'm wondering if you could possibly improve the process by grouping into larger blocks. For example, pull out enough tuples to cover 4 pages and then call picksplit three times to split it into the four pages. This gives you 4 entries for the level above the leaves. Keep reading tuples and splitting until you get enough for the next level and call picksplit on those. etc etc. The thing is, you never call penalty here so it's questionable whether the tree will be as efficient as just inserting. For example, if have a data type representing ranges (a,b), straight inserting can produce the perfect tree order like a b-tree (assuming non-overlapping entries). The above process will produce something close, but not quite... Should probably get out a pen-and-paper to model this. After all, if the speed of the picksplit increases superlinearly to the number of elements, calling it will larger sets may prove to be a loss overall... Perhaps the easiest would be to allow datatypes to provide a bulkinsert function, like b-tree does? The question is, what should be its input and output? Have a nice day, -- Martijn van Oosterhout <kleptog@xxxxxxxxx> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment:
signature.asc
Description: Digital signature