Re: Highly Efficient Custom Sorting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, Jul 6, 2010 at 3:01 PM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
> Read RFC 2782 on random weighted load balancing of SRV records inside DNS.

It may be asking a bit much to expect people here to read an RFC to
figure out how to help you solve this problem, but...


Yeah, I was not actually expecting them to read the whole RFC. The section on random weighted load balancing is only a few paragraphs and describes just the algorithm I am trying to implement as efficiently as possible:

   Priority
The priority of this target host. A client MUST attempt to
contact the target host with the lowest-numbered priority it can
reach; target hosts with the same priority SHOULD be tried in an
order defined by the weight field. The range is 0-65535. This
is a 16 bit unsigned integer in network byte order.

Weight
A server selection mechanism. The weight field specifies a
relative weight for entries with the same priority. Larger
weights SHOULD be given a proportionately higher probability of
being selected. The range of this number is 0-65535. This is a
16 bit unsigned integer in network byte order. Domain
administrators SHOULD use Weight 0 when there isn't any server
selection to do, to make the RR easier to read for humans (less
noisy). In the presence of records containing weights greater
than 0, records with weight 0 should have a very small chance of
being selected.

In the absence of a protocol whose specification calls for the
use of other weighting information, a client arranges the SRV
RRs of the same Priority in the order in which target hosts,
specified by the SRV RRs, will be contacted. The following
algorithm SHOULD be used to order the SRV RRs of the same
priority:

To select a target to be contacted next, arrange all SRV RRs
(that have not been ordered yet) in any order, except that all
those with weight 0 are placed at the beginning of the list.

Compute the sum of the weights of those RRs, and with each RR
associate the running sum in the selected order. Then choose a
uniform random number between 0 and the sum computed
(inclusive), and select the RR whose running sum value is the
first in the selected order which is greater than or equal to
the random number selected. The target host specified in the
selected SRV RR is the next one to be contacted by the client.
Remove this SRV RR from the set of the unordered SRV RRs and
apply the described algorithm to the unordered SRV RRs to select
the next target host. Continue the ordering process until there
are no unordered SRV RRs. This process is repeated for each
Priority.
The difference between this description and my implementation is that I have added a "group" field to the mix so that this algorithm is applied to each group independently of the others. Also, my input data has an "id" field which must be present on the same rows of the output and is used to map the output back to my original input.
 
...there's no question that writing things in C is a lot more work,
and takes some getting used to.  Still, it's fast, so maybe worth it,
especially since you already know C++, and will therefore mostly just
need to learn the PostgreSQL coding conventions.  The best thing to do
is probably to look at some of the existing examples within the
backend code.  Most of the datatype code is in src/backend/utils/adt.
You might want to look at arrayfuncs.c (perhaps array_ref() or
array_map()); and also rowtypes.c (perhaps record_cmp()).


I did actually find the arrayfuncs.c file and start looking through it for examples. I'm just not entirely clear on what is going on in some of those functions -- what is necessary to keep in order to extract my data and get it represented in C structures and what I can remove. I was hoping there was some sort of documentation on how to work with input arrays for extracting the data and getting it converted. In any event, I have spent several hours reverse engineering how that stuff works, and I think I am pretty close to being able to get my data into a C structure that I can work with.
 
> Specifically, how do I go about preserving the pointers to the data that I
> allocate in multi-call memory context so that they still point to the data
> on the next call to the function for the next result row? Am I supposed to
> set up some global variables to do that, or am I supposed to take a
> different approach? If I need to use global variables, then how do I deal
> with concurrency?

Global variables would be a bad idea, not so much because of
concurrency as because they won't get cleaned up properly.  Again, the
best thing to do is to look at existing examples, like array_unnest()
in src/backend/utils/adt/arrayfuncs.c; the short answer is that you
probably want to compute all your results on the first call and stash
them in the FuncCallContext (funcctx->user_fctx); and then on
subsequent calls just return one row per call.

Thanks for suggesting array_unnest(). I think that will actually prove more useful to me than the other example I'm using for extracting my data from an array. I was actually planning on computing the order on the first call and storing it in a linked list which gets returned one item at a time until all rows have been returned. Also, I found a code example using Google that showed someone storing data across function calls using that pointer. I used their example to produce this:

<snip>
    if(SRF_IS_FIRSTCALL()) {
        funcctx = SRF_FIRSTCALL_INIT();

        /* This is where we stick or sorted data for returning later */
        funcctx->user_fctx = MemoryContextAlloc(funcctx->multi_call_memory_ctx, sizeof(sort_data));
        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
        data = "" funcctx->user_fctx;
</snip>

I have a structure set up that is typedef'd to "sort_data" which stores pointers to various things that I need to survive across the calls. Since this seems to be what you are suggesting, I assume this is the correct approach.


--
Eliot Gable


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux