I'm still working on fixing a race condition I encountered in "gc" recently, but am not 100% sure of the fix. So I thought I'd send a braindump of what I have so far in case it jolts any memories. The problem is that when we "gc" we run "reflog expire --all". This iterates over the reflogs, and takes a *.lock for each reference. It'll fail intermittendly in two ways: 1. If something is concurrently committing to the repo it'll fail because we for a tiny amount of time hold a HEAD.lock file, so HEAD can't be updated. 2. On a repository that's just being "git fetch"'d by some concurrent process the "gc" will fail, because the ref's SHA1 has changed, which we inspect as we aquire the lock. To reproduce this have a repo getting a bunch of commits and "fetch" from it in a loop[A]. Then try to "gc" in it (or "reflog expire --all"). You'll get case #2 above (#1 can also be manually reproduce): $ git gc; echo $? error: cannot lock ref 'refs/remotes/origin/HEAD': ref 'refs/remotes/origin/HEAD' is at d13c708ae0435c88c8f7580442b70df9a40224d0 but expected 7fb5b7dd9988d040a9d29c0456874472bea2b439 error: cannot lock ref 'refs/remotes/origin/master': ref 'refs/remotes/origin/master' is at d13c708ae0435c88c8f7580442b70df9a40224d0 but expected 7fb5b7dd9988d040a9d29c0456874472bea2b439 fatal: failed to run reflog 128 This behavior is due to a combination of: - 62aad1849f ("gc --auto: do not lock refs in the background", 2014-05-25) - f3b661f766 ("expire_reflog(): use a lock_file for rewriting the reflog file", 2014-12-12) Discussion around those changes: https://public-inbox.org/git/1400978309-25235-1-git-send-email-pclouds@xxxxxxxxx/ https://public-inbox.org/git/CAGZ79kaPzMPcxMqsTeW5LjYNc7LbpjHE5eBPHWKvUkBU3NvGJw@xxxxxxxxxxxxxx/ I haven't yet found a reason for why we'd intrinsically need this lock, and indeed this doesn't produce corruption AFAICT: diff --git a/refs/files-backend.c b/refs/files-backend.c index ef053f716c..ffb9fbbf3a 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3043,7 +3043,7 @@ static int files_reflog_expire(struct ref_store *ref_store, if (!lock) { error("cannot lock ref '%s': %s", refname, err.buf); strbuf_release(&err); - return -1; + return 0; } if (!refs_reflog_exists(ref_store, refname)) { unlock_ref(lock); But that just "solves" #2, not #1, and also no-oping the OID check in verify_lock() still has all tests pass, so we're likely missing the relevant race condition tests. A. ( cd /tmp && rm -rf git && git init git && cd git && while true do head -c 10 /dev/urandom | hexdump >out && git add out && git commit -m"out" done ) ( cd /tmp && rm -rf git-clone && git clone file:///tmp/git git-clone && cd git-clone && while true do git reset --hard && git pull done )