Re: [PATCH 2/2] lock_packed_refs(): allow retries when acquiring the packed-refs lock

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Mon, 11 May 2015 12:26:23 +0200

On 05/05/2015 09:21 PM, Jeff King wrote:
> On Sat, May 02, 2015 at 07:19:28AM +0200, Michael Haggerty wrote:
> 
>> 100 ms seems to be considered an acceptable delay between the time that
>> a user, say, clicks a button and the time that the button reacts. What
>> we are talking about is the time between the release of a lock by one
>> process and the resumption of another process that was blocked waiting
>> for the lock. The former is probably not under the control of the user
>> anyway, and perhaps not even observable by the user. Thus I don't think
>> that a perceivable delay between that event and the resumption of the
>> blocked process would be annoying. The more salient delay is between the
>> time that the user started the blocked command and when that command
>> completed. Let's look in more detail.
> 
> Yeah, you can't impact when the other process will drop the lock, but if
> we assume that it takes on the order of 100ms for the other process to
> do its whole operation, then on average we experience half that. And
> then tack on to that whatever time we waste in sleep() after the other
> guy drops the lock. And that's on average half of our backoff time.
> 
> So something like 100ms max backoff makes sense to me, in that it keeps
> us in the same order of magnitude as the expected time that the lock is
> held. [...]

I don't understand your argument. If another process blocks us for on
the order of 100 ms, the backoff time (reading from my table) is less
than half of that. It is only if another process blocks us for longer
that our backoff times grow larger than 100 ms. I don't see the point of
comparing those larger backoff numbers to hypothetical 100 ms expected
blocking times when the larger backoffs *can only happen* for larger
blocking times [1].

But even aside from bikeshedding about which backoff algorithm might be
a tiny bit better than another, let's remember that these locking
conflicts are INCREDIBLY RARE in real life. Current git doesn't have any
retry at all, but users don't seem to be noticeably upset.

In a moment I will submit a re-roll, changing the test case to add the
"wait" that Johannes suggested but leaving the maximum backoff time
unchanged. If anybody feels strongly about changing it, go ahead and do
so (or make it configurable). I like the current setting because I think
it makes more sense for servers, which is the only environment where
lock contention is likely to occur with any measurable frequency.

Michael

[1] For completeness, let's also consider a difference scenario: Suppose
the blocking is not being caused by a single long-lived process but
rather by many short-lived processes running one after the other. In
that case the time we spend blocking depends more on the duty cycle of
other blocking processes, so our backoff time could grow to be longer
than the mean time that any single process holds the lock. But in this
scenario we are throughput-limited rather than latency limited, so our
success in acquiring the lock sooner only deprives another process of
the lock, not significantly improving the throughput of the system as a
whole. (And given that the other processes are probably following the
same rules as we are, the shorter backoff times are just as often
helping them snatch the lock from us as us from them.)

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html