Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 27 Jul 2017 23:07:22 -0700

On Thu, Jul 27, 2017 at 12:40 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote:
>>
>>> Makes sense.  Makes me wonder why we need a separate .new file
>>> (instead of writing into the .lock instead), but that is a different
>>> issue.
>>
>> It comes from 42dfa7ece (commit_packed_refs(): use a staging file
>> separate from the lockfile, 2017-06-23). That commit explains that we
>> want to be able to put the new contents into service before we release
>> the lock. But it doesn't say why that's useful.
>
> By being able to hold the lock on packed-refs longer, I guess
> something like this becomes possible:
>
>  * hold the lock on packed-refs
>  * hold the lock on loose ref A, B, C, ...
>  * update packed-refs to include the freshest values of these refs
>  * start serving packed-refs without releasing the lock
>  * for X in A, B, C...: delete the loose ref X and unlock X
>  * unlock the packed-refs

Hmmm, I thought that I explained somewhere why this is useful, but I
can't find it now.

The code even after this patch series is

Prepare:
1. lock loose refs
Finish:
2. perform creates and updates of loose refs, releasing their locks as we go
3. perform deletes of loose refs
4. lock packed-refs file
5. repack_without_refs() (delete packed versions of refs to be deleted)
6. commit new packed-refs file (which also unlocks it)
7. release locks on loose refs that were deleted

This is problematic because after a loose reference is deleted, it
briefly (between steps 3 and 5) reveals any packed version of the
reference, which might even point at a commit that has been GCed. It
is even worse if the attempt to acquire the packed refs lock in step 4
fails, because then the repo could be left in that broken state.

Clearly we can avoid the second problem by acquiring the packed-refs
lock during the prepare phase. You might also attempt to fix the first
problem by deleting the references from the packed-refs file before we
delete the corresponding loose references:

Prepare:
1. lock loose refs
2. lock packed-refs file
Finish:
3. perform creates and updates of loose refs, releasing their locks as we go
4. repack_without_refs() (delete packed versions of refs to be deleted)
5. commit new packed-refs file (which also unlocks it)
6. perform deletes of loose refs
7. release locks on loose refs that were deleted

But now we have a different problem. pack-refs does the following:

A. lock packed-refs file
B. read loose refs
C. commit new packed-refs file (which also unlocks it)
Then, for each loose reference that should be pruned, delete it in a
transaction with REF_ISPRUNING, which means:
D. lock the loose reference
E. lock packed-refs file
F. check that the reference's value is the same as the value just
written to packed-refs
G. delete the loose ref file if it still exists
H. unlock the loose reference

If a reference deletion is running at the same time as a pack-refs,
you could have a race where steps A and B of a pack-refs run between
steps 5 and 6 of a reference deletion. The pack-refs would see the
loose reference and would pack it into the packed-refs file, leaving
the reference at its old value even though the user was told that the
reference was deleted.

If a new packed-refs file can be activated while retaining the lock on
the file, then a reference update could keep the packed-refs file
locked during step 6 of the second algorithm, which would block step A
of pack-refs from beginning.

I have the feeling that there might be other, more improbably races
lurking here, too.

The eternal problems with races in the files backend is a main reason
that I'm enthusiastic about the reftable proposal.

Michael