Re: maintaining a reference to a fetched row

Jeff Janes <jeff.janes@xxxxxxxxx> · Wed, 4 Nov 2009 20:27:27 -0800

On Wed, Nov 4, 2009 at 9:41 AM, Brian Karlak <zenkat@xxxxxxxxxxx> wrote:
>
> I do a select for update in a stored proc:
>
> FOR queue_item IN
>
>   SELECT *  FROM queue
>    WHERE status IS NULL AND id >= low_bound_id
>    ORDER BY id LIMIT batch_size
>      FOR UPDATE
>
> LOOP
>
>   UPDATE queue_proc set status = 'proc' where id = queue_item.id ;
>
> The daemons keep track of their last position in the queue with
> low_bound_id.  Also, as you probably notice, I also fetch a batch of (100)
> items at a time.  In practice, it's pretty fast.  The job I'm running now is
> showing an average fetch time of 30ms per 100 actions, which ain't bad.
>
> However, the writing of results back to the row takes ~5ms, which is slower
> than I'd like.

5 ms per each of the 100 actions?  With one commit per action?

> > It seems you have an select, and update, and another update.  Where in
> > this process do you commit?  Are you using fsync=off or
> > synchronous_commit=off?
>
> First commit occurs after the stored proc to select/update a batch of items
> is complete.

So one commit per 100 items?

> Second commit occurs on the writing of results back for each
> particular action.

So one commit per 1 item?
If so, this completely explains the difference in speed, I think.

> Two commits are required because the time it takes to
> complete the intervening action can vary wildly: anywhere between 20ms and
> 45min.

Is there any way of knowing/approximating ahead of time how long it will take?

The 45 min monsters must be exceedingly rare, or else the average
could not be ~50ms.

>> Why would the index scan take 1 ms two of the times it is done but 5ms
>> the third time?  Isn't it the same index scan each time?  Or does the
>> change in queue.status change the plan?
>
> The final update is a different query -- just a plain old update by ID:
>
> UPDATE queue_proc set status = 'proc' where id = %s ;

That looks very much like the other UPDATE you showed.  The difference
it seems is that you commit after every one, rather than after every
100.  Right?

> This update by ID takes ~2.5ms, which means it's where the framework is
> spending most of its overhead.

You said the computation task can take anywhere from 20ms to 45min, so
it seems that this update overhead is at most 1/8 of the irreducible
time.  That doesn't seem like it is enough to worry about, to me.

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance