Re: [PATCH v2 09/11] reflog expire: don't lock reflogs using previously seen OID

Jeff King <peff@xxxxxxxx> · Fri, 16 Jul 2021 22:23:45 -0400

On Fri, Jul 16, 2021 at 04:13:05PM +0200, Ævar Arnfjörð Bjarmason wrote:

> During reflog expiry, the cmd_reflog_expire() function first iterates
> over all reflogs in logs/*, and then one-by-one acquires the lock for
> each one and expires it. This behavior has been with us since this
> command was implemented in 4264dc15e1 ("git reflog expire",
> 2006-12-19).
> 
> Change this to stop calling lock_ref_oid_basic() with the OID we saw
> when we looped over the logs, instead have it pass the OID it managed
> to lock.
> 
> This mostly mitigates a race condition where e.g. "git gc" will fail
> in a concurrently updated repository because the branch moved since
> "git reflog expire --all" was started. I.e. with:
> 
>     error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>
> 
> This behavior of passing in an "oid" was needed for an edge-case that
> I've untangled in this and preceding commits though, namely that we
> needed this OID because we'd:
> 
>  1. Lookup the reflog name/OID via dwim_log()
>  2. With that OID, lock the reflog
>  3. Later in builtin/reflog.c we use the we looked as input to
>     lookup_commit_reference_gently(), assured that it's equal to the
>     OID we got from dwim_log().

s/we use the we/we use the oid we/ in point 3, I think?

This explains the race and why it mitigates it, which is good. Are we
sure that there is no value in matching the oid we found during the log
lookup with the current value of the ref? I.e., why is this safe to do?

I suspect it is OK. The only problem would be if the reflog.c is somehow
relying on the ref's value not having changed from the earlier call
(e.g., if it did some reachability computation based on that value).

But I don't think so. Between the dwim_log() call and the
reflog_expire() call in reflog_expire(), we do almost nothing. In
reflog_delete(), between the two we do count up the entries to find a
record number. We do use that in deciding whether to expire an entry,
but:

  - I believe it is counting up from the beginning of time, so if
    somebody appends a new entry in the meantime, we are OK.

  - Further updates are not strictly prevented by comparing the oids
    anyway (we could generate new reflog entries without changing the
    ref; either a noop, or a switch-and-revert).

I do suspect there's a latent bug there if somebody else tries to expire
the reflogs (because now they're deleting old entries which are part of
our "recno"). But it is neither helped nor hindered by your change here,
so let's ignore it for now.

> [...]

The patch itself looks correct to me.

I think it would be good to have a brief "why is this safe" argument in
the commit message along the lines of what I wrote above.

-Peff