Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 03, 2023 at 04:16:17PM -0400, Jeff King wrote:
> On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
>
> > Note that this behavior is usually just a performance regression. But
> > it's possible it could be a correctness issue.
> >
> > Suppose an object was duplicated among the cruft and non-cruft pack. The
> > MIDX will pick the one from the pack with the lowest mtime, which will
> > always be the cruft one. But if the non-cruft pack happens to sort
> > earlier in lexical order, we'll treat that one as preferred, but not all
> > duplicates will be resolved in favor of that pack.
> >
> > So if we happened to have an object which appears in both packs
> > (e.g., due to a cruft object being freshened, causing it to appear
> > loose, and then repacking it via the `--geometric` repack) it's possible
> > the duplicate would be picked from the non-preferred pack.
>
> I'm not sure I understand how that is a correctness issue. The contents
> of the object are the same in either pack. Or do you mean that the
> pack-reuse code in pack-objects.c may get confused and try to use the
> wrong pack/offset when sending the object out? I would think it would
> always be coming from the preferred pack for that code (so the outcome
> is just that we fail to do the pack-reuse optimization very well, but we
> don't generate a wrong answer).

Admittedly I'm not 100% sure of my interpretation here either, since I
wrote most of this patch's log message nearly two years ago ;-).

I think the issue was as follows:

  - you have an object duplicated among some cruft and non-cruft pack
  - the cruft pack happens to have an older mtime than the non-cruft one
  - the repack reused no non-cruft packs, which (before this patch)
    would let the MIDX avoid picking a preferred pack
  - if the non-cruft pack happens to sort ahead of the cruft one in
    lexical order, it'll appear in the first few bits of the bitmap's
    ordering, causing us to treat it as if it were the preferred pack
  - ...but the MIDX will resolve duplicate objects in favor of the
    oldest pack when neither are preferred, causing us to pick a
    duplicate from the non-"preferred" pack.

(The last point is what causes the pack-bitmap reading code to get
confused, since it expects and makes assumption based on resolving ties
in favor of the preferred pack.)

That feels right to me, but admittedly my memory is pretty hazy on those
bitmap bugs. However, I don't think that this is an issue anymore, since
this patch was written before we had 177c0d6e63 (midx: infer preferred
pack when not given one, 2021-08-31) in GitHub's private fork, and the
patch message here incorrectly refers to statements about GitHub's fork,
not git.git.

But since we have 177c0d6e63 in git.git, this paragraph can and should
be dropped as finding ourselves in that situation is impossible, since
we would infer a preferred pack when not given one, and resolve
duplicates in favor of it.

That makes this purely a performance issue.

> Other than that, the explanation and patch make perfect sense to me, and
> the patch looks good.

Thanks!

> -Peff

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux