Re: [PATCH 0/2] pack-write,repack: prevent opening packs too early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 01, 2021 at 12:29:54AM -0400, Taylor Blau wrote:

> > We _might_ also want to re-order the calls to write_idx_file() and
> > write_rev_file() in its caller, given that simultaneous readers are
> > happy to read our tmp_pack_* files. I guess the same might apply to the
> > actual file write order pack-objects, too? I'm not sure if that's even
> > possible, though; do we rely on side effects of generating the .idx when
> > generating the other meta files?
> 
> These are a little trickier. write_idx_file() is also responsible for
> rearranging the objects array into index (name) order on the way out,
> which write_rev_file() depends on in order to build up its mapping.
> 
> So you could sort the array at the call-site before calling either
> function, but it gets awkward since there are a handful of other callers
> of write_idx_file() besides the two we're discussing.

Yeah, I had in the back of my mind that there was some dependency there.
I definitely prefer the "readers should not pick up tmp-packs" approach
if that is workable.

> > I think it might be more sensible if the reading side was taught to
> > ignore ".tmp-*" and "tmp_*" (and possibly even ".*", though it's
> > possible somebody is relying on that).
> 
> ...this seems like the much-better way to go. Git shouldn't have to
> worry about what order it writes the temporary files in, only what order
> those temporary files are made non-temporary.
> 
> But I need to do some more investigation to make sure there aren't any
> other implications. So I'm happy to wait on that, or send a new version
> of this series with a patch to fix the race in
> builtin/index-pack.c:final(), too.

I think if we kept it restricted to ".tmp-*" and "tmp_*", it should be
pretty safe. The absolute worst case is that somebody trying to recover
a corrupted repository might have to rename the files manually, I would
think.

Blocking ".*" is a harder sell. If we were starting from scratch, I'd
probably do that, but now we don't know what weird things people might
be doing. So unless there's a huge gain, it's hard to justify. (If we
were starting from scratch, I'd actually probably insist they be named
pack-$checksum.pack, etc, but it's much too late for that now).

So anyway. I think we definitely want the index-pack.c change. We
_could_ stop there and change the read side as a separate series, but I
think that until we do, the ordering changes on the write side aren't
really doing that much. They're shrinking the race a little, I guess,
but it's still very much there.

> (Unrelated to your suggestions above) another consideration for "stuff
> to do later" would be to flip the default of pack.writeReverseIndex. I
> had intentions to do that in the 2.32 cycle, but I forgot about it.

Oh, yeah. We should definitely do that (in its own series). The .rev
files have been a huge performance win, and I don't think there's any
reason we wouldn't want to always use them.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux