On Thu, Jul 27, 2017 at 12:40 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jeff King <peff@xxxxxxxx> writes: > >> On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote: >> >>> Makes sense. Makes me wonder why we need a separate .new file >>> (instead of writing into the .lock instead), but that is a different >>> issue. >> >> It comes from 42dfa7ece (commit_packed_refs(): use a staging file >> separate from the lockfile, 2017-06-23). That commit explains that we >> want to be able to put the new contents into service before we release >> the lock. But it doesn't say why that's useful. > > By being able to hold the lock on packed-refs longer, I guess > something like this becomes possible: > > * hold the lock on packed-refs > * hold the lock on loose ref A, B, C, ... > * update packed-refs to include the freshest values of these refs > * start serving packed-refs without releasing the lock > * for X in A, B, C...: delete the loose ref X and unlock X > * unlock the packed-refs Hmmm, I thought that I explained somewhere why this is useful, but I can't find it now. The code even after this patch series is Prepare: 1. lock loose refs Finish: 2. perform creates and updates of loose refs, releasing their locks as we go 3. perform deletes of loose refs 4. lock packed-refs file 5. repack_without_refs() (delete packed versions of refs to be deleted) 6. commit new packed-refs file (which also unlocks it) 7. release locks on loose refs that were deleted This is problematic because after a loose reference is deleted, it briefly (between steps 3 and 5) reveals any packed version of the reference, which might even point at a commit that has been GCed. It is even worse if the attempt to acquire the packed refs lock in step 4 fails, because then the repo could be left in that broken state. Clearly we can avoid the second problem by acquiring the packed-refs lock during the prepare phase. You might also attempt to fix the first problem by deleting the references from the packed-refs file before we delete the corresponding loose references: Prepare: 1. lock loose refs 2. lock packed-refs file Finish: 3. perform creates and updates of loose refs, releasing their locks as we go 4. repack_without_refs() (delete packed versions of refs to be deleted) 5. commit new packed-refs file (which also unlocks it) 6. perform deletes of loose refs 7. release locks on loose refs that were deleted But now we have a different problem. pack-refs does the following: A. lock packed-refs file B. read loose refs C. commit new packed-refs file (which also unlocks it) Then, for each loose reference that should be pruned, delete it in a transaction with REF_ISPRUNING, which means: D. lock the loose reference E. lock packed-refs file F. check that the reference's value is the same as the value just written to packed-refs G. delete the loose ref file if it still exists H. unlock the loose reference If a reference deletion is running at the same time as a pack-refs, you could have a race where steps A and B of a pack-refs run between steps 5 and 6 of a reference deletion. The pack-refs would see the loose reference and would pack it into the packed-refs file, leaving the reference at its old value even though the user was told that the reference was deleted. If a new packed-refs file can be activated while retaining the lock on the file, then a reference update could keep the packed-refs file locked during step 6 of the second algorithm, which would block step A of pack-refs from beginning. I have the feeling that there might be other, more improbably races lurking here, too. The eternal problems with races in the files backend is a main reason that I'm enthusiastic about the reftable proposal. Michael