On Sun, Jun 19, 2011 at 11:36 PM, Julius Tuskenis <julius@xxxxxxxx> wrote:
Hello,
I'm sorry to write again, but as I received no answer I wonder if there is a better mailing list to address concerning this question? Or is there nothing to be done about the speed of xmlagg ?. Please let me as no answer is the worst answer to get....
I played around a little after your initial post, but I'm not a postgresql developer and I have no familiarity with the internals, so everything I'm about to write is pure conjecture.
Programming languages that treat strings as immutable often suffer from this kind of problem. WIth each string concatenation, both strings have to be copied to a new location in memory. I suspect that this is happening in this case. The only viable fix would be to use a buffer that is mutable and append into it rather than doing raw string concatenation. If this is truly the problem, I don't see that you have much choice but to re-implement xmlagg in one of the available languages such that it uses a buffer instead of immutable string concatenation. It is probably that the xmlelement function doesn't need to be rewritten, since it is only concatenating relatively short strings. It is less efficient than appending to a buffer, but shouldn't get catastrophically slow. But xmlagg is concatenating many rows. If each concatenation results in a full copy of the already concatenated rows, you can see where performance would drop catastrophically.
Here's the first google result for 'mutable string python' that I found, which addresses this problem in python. http://www.skymind.com/~ocrow/python_string/ You could rewrite the aggregate function in plpython using one of the techniques in that file. I just attempted to find the source to xm_agg in the postgresql source code and it is pretty well obfuscated, so I don't see that being much help. I wasn't even able to determine if the problem actually is immutable string concatenation. So we don't know if xmlagg is building a DOM tree and then serializing it once (which would imply that XMLELEMENT returns a single DOM node, or if it is all workign with strings. Barring answers from someone who actually knows, I can only suggest that you read through the documentation on writing an aggregate function and then do some experimentation to see what you get when you use your own aggregate function instead of xml_agg. Hopefully, such experimentation won't take long to determine if re-implementing xml_agg with a mutable buffer is a viable option.
--sam