Re: Order of operations at the end of fast-import?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 16, 2020 at 01:24:49PM +0900, Mike Hommey wrote:

> I just noticed that the order of operations at the end of fast-import
> are:
> - end_packfile
> - dump_branches
> - dump_tags
> 
> The first may create loose objects if the pack is small (per
> fastimport.unpackLimit, defaulting to 100). The latter two create refs.
> 
> There seems to be a theoretical race condition here, if something else
> triggers a gc at the "wrong" time, the loose objects might be cleaned up
> and the branches/tags refs become dangling.
>
> I understand that the packfile does need to be finished before creating
> the refs, and that the unpacking replaces that when there aren't enough
> objects, but wouldn't it be more data-safe to actually finish the pack,
> create the refs, and then unpack?

That race is there even without the unpacking step. Another gc might
remove the pack, dropping its unreferenced objects.

We do add a ".keep" between writing the pack and updating the refs, but
it doesn't look like it's done atomically (i.e., we write the .idx file
and _then_ add the .keep). So there's a small race there.

But all of this is also true of any operation, like git-commit. It's
creating new loose objects, and then will try to reference them. In
between, a simultaneous gc will think they're unreachable. Likewise,
receiving a push may write a pack (with a .keep, though in the correct
order) or even loose objects.

This is usually handled by the gc expiration time, which is compared
against the file mtime. The default is 2 weeks, but even something short
like 5 minutes would be plenty to avoid this race (even for a long
import, we should be updating the mtime every time we call write()).

In fact, gc will use the same expiration for clearing out tempfiles. So
even before we write the final pack and its .keep, any temporary files
we're writing into would be at risk. But again, if we're actively
writing, their mtimes should save them.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux