On Thu, Apr 16, 2020 at 01:24:49PM +0900, Mike Hommey wrote: > I just noticed that the order of operations at the end of fast-import > are: > - end_packfile > - dump_branches > - dump_tags > > The first may create loose objects if the pack is small (per > fastimport.unpackLimit, defaulting to 100). The latter two create refs. > > There seems to be a theoretical race condition here, if something else > triggers a gc at the "wrong" time, the loose objects might be cleaned up > and the branches/tags refs become dangling. > > I understand that the packfile does need to be finished before creating > the refs, and that the unpacking replaces that when there aren't enough > objects, but wouldn't it be more data-safe to actually finish the pack, > create the refs, and then unpack? That race is there even without the unpacking step. Another gc might remove the pack, dropping its unreferenced objects. We do add a ".keep" between writing the pack and updating the refs, but it doesn't look like it's done atomically (i.e., we write the .idx file and _then_ add the .keep). So there's a small race there. But all of this is also true of any operation, like git-commit. It's creating new loose objects, and then will try to reference them. In between, a simultaneous gc will think they're unreachable. Likewise, receiving a push may write a pack (with a .keep, though in the correct order) or even loose objects. This is usually handled by the gc expiration time, which is compared against the file mtime. The default is 2 weeks, but even something short like 5 minutes would be plenty to avoid this race (even for a long import, we should be updating the mtime every time we call write()). In fact, gc will use the same expiration for clearing out tempfiles. So even before we write the final pack and its .keep, any temporary files we're writing into would be at risk. But again, if we're actively writing, their mtimes should save them. -Peff