Re: [PATCH] repack: don't remove .keep packs with `--pack-kept-objects`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 17, 2022 at 10:26:06PM -0400, Taylor Blau wrote:

> But when `--pack-kept-objects` is given, things can go awry. Namely,
> when a kept pack is included in the list of packs tracked by the
> `pack_geometry` struct *and* part of the pack roll-up, we will delete
> the `.keep` pack when we shouldn't.

Oops. Your explanation makes sense, and this is an easy case to miss.

After reading the message but before seeing the patch, I'd guess we just
need to add "if (!this_pack_is_kept)" to the removal loop.

And indeed, it's close to that:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index a5bacc7797..f71909696d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1089,6 +1089,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  				strbuf_addstr(&buf, pack_basename(p));
>  				strbuf_strip_suffix(&buf, ".pack");
> 
> +				if ((p->pack_keep) ||
> +				    (string_list_has_string(&existing_kept_packs,
> +							    buf.buf)))
> +					continue;
> +
>  				remove_redundant_pack(packdir, buf.buf);
>  			}
>  			strbuf_release(&buf);

but what is the difference between p->pack_keep, and being in the
existing_kept_packs list? From the similar logic in
init_pack_geometry():

         /*
          * Any pack that has its pack_keep bit set will appear
          * in existing_kept_packs below, but this saves us from
          * doing a more expensive check.
          */
         if (p->pack_keep)
                 continue;

         /*
          * The pack may be kept via the --keep-pack option;
          * check 'existing_kept_packs' to determine whether to
          * ignore it.
          */
         strbuf_reset(&buf);
         strbuf_addstr(&buf, pack_basename(p));
         strbuf_strip_suffix(&buf, ".pack");

         if (string_list_has_string(existing_kept_packs, buf.buf))
                  continue;

it looks like one is a superset of the other, but checking p->pack_keep
is just slightly cheaper, so we do it first. And checking just
p->pack_keep isn't sufficient; the list has extra packs from the
--keep-pack command-line. So your new code checking both is correct.

But one thing to note. In that existing logic, the two checks are split
apart, since in the fast path (i.e., when p->pack_keep is set) we can
skip the extra work to form the pack string. We could do that here, but
I doubt it matters much:

  1. It's really not that expensive. It's a little extra string
     manipulation and a binary-search lookup for each pack.

  2. Normally most packs aren't kept, so we'd end up in the "slow" path
     most of the time anyway.

So I suspect that existing code is premature optimization, and you
really could just look in existing_kept_packs (both there and in this
patch).

But I don't mind this patch copying the existing logic in the meantime.
It's exactly this kind of "while we are in the area" cleanup where you
end up surprised that the p->pack_keep check really was covering some
subtle corner case. ;)

So the patch looks good to me as-is (and sorry for the verbose review;
we've just had enough tricky corner cases in this repack code that I
wanted to make sure I understood all the implications).

> This had mostly been a mystery that didn't occur at high enough volume
> to justify looking into. But it went away entirely after merging in
> v2.36.x, which contains e4d0c11c04.
> 
> Some investigating with Victoria today led to the patch above, which is
> still relevant since e4d0c11c04 papers over an existing bug.

Thank you both for digging into this. Because of the scale, GitHub has a
unique opportunity to trigger these kind of weird corner cases and races
that would otherwise go unreported and just cause occasional
head-scratching among people running sane-sized Git servers.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux