The issue is how efficiently the languages can deal with arrays. In Perl, I have to parse a string into an array of data, then break it up into sub arrays inside associative arrays just to work with the input. I also have to splice the array to remove elements, which I don't think is very efficient. Any way I could come up with of removing elements involved rebuilding the entire array. The same thing goes for pl/pgsql. Dealing with arrays there is also not very efficient. I do a lot of constructing of arrays from sets of data using myvar = array(select blah);. While pl/pgsql was considerably faster than Perl, it cannot come close to what I did in C++ using a hash of a hash of a linked list. The two hash tables provide my groupings and the linked list gives me something that is highly efficient for removing elements as I pick them.
I've looked through the documentation on how to re-write this in C, but I cannot seem to find good documentation on working with the input array (which is an array of a complex type). I also don't see good documentation for working with the complex type. I found stuff that talks about constructing a complex type in C and returning it. However, I'm not sure how to take an input complex type and deconstruct it into something I can work with in C. Also, the memory context management stuff is not entirely clear. Specifically, how do I go about preserving the pointers to the data that I allocate in multi-call memory context so that they still point to the data on the next call to the function for the next result row? Am I supposed to set up some global variables to do that, or am I supposed to take a different approach? If I need to use global variables, then how do I deal with concurrency?
On Sat, Jul 3, 2010 at 2:08 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
<egable+pgsql-performance@xxxxxxxxx> wrote:
> Well, I re-wrote the algorithm in Perl. However, it did not solve the speedok, I didn't take the time to read your implementation and completely
> issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
> getting before (15.2 of which is sorting). Here is the Perl code on the
> sorting. I won't post the pl/pgsql code, because this is far more clear (in
> my opinion) on what the algorithm does:
> DROP TYPE IF EXISTS glbtype CASCADE;
> CREATE TYPE glbtype AS (
> id INTEGER,
> "group" TEXT,
> priority INTEGER,
> weight INTEGER
> );
> DROP TYPE IF EXISTS glbtype_result CASCADE;
> CREATE TYPE glbtype_result AS (
> id INTEGER,
> priority INTEGER,
> weight INTEGER,
> "order" BIGINT
> );
> CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
> glbtype_result AS
understand it, but it looks like you're looking at a N^2 sorting at
best.
You probably want to do something like this (it might not be quite
right, you need to explain what each of your input array fields is
supposed to represent):
CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF$$
glbtype_result AS
with g as (select unnest(glbtype) as t)
select array(select ((t).id, (t).priority) (t).weight), 0)::glbtype_result
from g order by (t).group, (t).priority, random() * (t).weight);
$$ language sql;
(not sure what "order" is, is that the rownum, can't that just be
inferred from the array position?)
merlin
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero