On Tue, Oct 03, 2023 at 04:16:17PM -0400, Jeff King wrote: > On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote: > > > Note that this behavior is usually just a performance regression. But > > it's possible it could be a correctness issue. > > > > Suppose an object was duplicated among the cruft and non-cruft pack. The > > MIDX will pick the one from the pack with the lowest mtime, which will > > always be the cruft one. But if the non-cruft pack happens to sort > > earlier in lexical order, we'll treat that one as preferred, but not all > > duplicates will be resolved in favor of that pack. > > > > So if we happened to have an object which appears in both packs > > (e.g., due to a cruft object being freshened, causing it to appear > > loose, and then repacking it via the `--geometric` repack) it's possible > > the duplicate would be picked from the non-preferred pack. > > I'm not sure I understand how that is a correctness issue. The contents > of the object are the same in either pack. Or do you mean that the > pack-reuse code in pack-objects.c may get confused and try to use the > wrong pack/offset when sending the object out? I would think it would > always be coming from the preferred pack for that code (so the outcome > is just that we fail to do the pack-reuse optimization very well, but we > don't generate a wrong answer). Admittedly I'm not 100% sure of my interpretation here either, since I wrote most of this patch's log message nearly two years ago ;-). I think the issue was as follows: - you have an object duplicated among some cruft and non-cruft pack - the cruft pack happens to have an older mtime than the non-cruft one - the repack reused no non-cruft packs, which (before this patch) would let the MIDX avoid picking a preferred pack - if the non-cruft pack happens to sort ahead of the cruft one in lexical order, it'll appear in the first few bits of the bitmap's ordering, causing us to treat it as if it were the preferred pack - ...but the MIDX will resolve duplicate objects in favor of the oldest pack when neither are preferred, causing us to pick a duplicate from the non-"preferred" pack. (The last point is what causes the pack-bitmap reading code to get confused, since it expects and makes assumption based on resolving ties in favor of the preferred pack.) That feels right to me, but admittedly my memory is pretty hazy on those bitmap bugs. However, I don't think that this is an issue anymore, since this patch was written before we had 177c0d6e63 (midx: infer preferred pack when not given one, 2021-08-31) in GitHub's private fork, and the patch message here incorrectly refers to statements about GitHub's fork, not git.git. But since we have 177c0d6e63 in git.git, this paragraph can and should be dropped as finding ourselves in that situation is impossible, since we would infer a preferred pack when not given one, and resolve duplicates in favor of it. That makes this purely a performance issue. > Other than that, the explanation and patch make perfect sense to me, and > the patch looks good. Thanks! > -Peff Thanks, Taylor