Re: [PATCH 2/2] pack-objects: only perform verbatim reuse on the preferred pack

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 14 Nov 2024 08:40:13 -0500

On Wed, Nov 13, 2024 at 07:25:04PM -0500, Jeff King wrote:
> On Wed, Nov 13, 2024 at 12:32:58PM -0500, Taylor Blau wrote:
>
> > Instead, we can only safely perform the whole-word reuse optimization on
> > the preferred pack, where we know with certainty that no gaps exist in
> > that region of the bitmap. We can still reuse objects from non-preferred
> > packs, but we have to inspect them individually in write_reused_pack()
> > to ensure that any gaps that may exist are accounted for.
>
> Yep. With the disclaimer that I'm biased because I helped a little with
> debugging, this change is obviously correct. Forgetting the bug you saw
> in the real world, we know this function cannot work as-is because of
> the potential for those gaps.

Yep, and thanks again for your help ;-).

> > This allows us to simplify the implementation of
> > write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
> > form, since we can now assume that the beginning of the pack appears at
> > the beginning of the bitmap, meaning that we don't have to account for
> > any bits up to the first word boundary (like we had to special case in
> > ca0fd69e37).
> >
> > The only significant changes from the pre-ca0fd69e37 implementation are:
> > [...]
>
> Thanks for this section. My first instinct was to go back and look at
> the diff to the pre-midx version of the function, and this nicely
> explains the hunks I see there.
>
> So this patch looks good to me. I was able to follow your explanation in
> the commit message, but that may not count for much since I'm probably
> the only other person with deep knowledge of the verbatim-reuse code in
> the first place. ;)

Heh.

> I do think the explanation in the message of the first commit would be a
> lot simpler if it were simply combined into this patch. With them split
> you effectively have to explain the problem twice. I don't feel that
> strongly about changing it, though.

I always seem to go back and forth on that. I feel somewhat strongly
that for complicated regression fixes that we should demonstrate the
existing failure mode in a separate commit with a test_expect_failure.
That forces the author to ensure they really understand the bug and can
produce a minimal (or close to it) reproduction.

It also makes it easier to demonstrate that the fix actually does what
it says, instead of assuming that the test fails without the fix applied
(and passes with it applied).

That does force the author to potentially explain the bug twice. In my
experience, I tend to keep the explanation in the first patch relatively
brief, hinting at details that will appear in the subsequent patch
instead of explaining them in full detail.

So I dunno. It's a tradeoff for sure, but I think having an explicit
point in the log that demonstrates the existing bug is valuable.

Thanks,
Taylor