Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:

> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> learned to remove a multi-pack-index file if it added or removed a pack
> from the object store.
> 
> This mechanism is a little over-eager, since it is only necessary to
> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> Adding a pack outside of the MIDX does not require invalidating the
> MIDX, and likewise for removing a pack the MIDX does not know about.

Does "git repack" ever remove just one pack? Obviously "git repack -ad"
or "git repack -Ad" is going to pack everything and delete the old
packs. So I think we'd want to remove a midx there.

And "git repack -d" I think of as deleting only loose objects that we
just packed. But I guess it could also remove a pack that has now been
made redundant? That seems like a rare case in practice, but I suppose
is possible.

Not exactly related to your fix, but kind of the flip side of it: would
we ever need to retain a midx that mentions some packs that still exist?

E.g., imagine we have a midx that points to packs A and B, and
git-repack deletes B. By your logic above, we need to remove the midx
because now it points to objects in B which aren't accessible. But by
deleting it, could we be deleting the only thing that mentions the
objects in A?

I _think_ the answer is "no", because we never went all-in on midx and
allowed deleting the matching .idx files for contained packs. So we'd
still have that A.idx, and we could just use the pack as normal. But
it's an interesting corner case if we ever do go in that direction.

If you'll let me muse a bit more on midx-lifetime issues (which I've
never really thought about before just now):

I'm also a little curious how bad it is to have a midx whose pack has
gone away. I guess we'd answer queries for "yes, we have this object"
even if we don't, which is bad. Though in practice we'd only delete
those packs if we have their objects elsewhere. And the pack code is
pretty good about retrying other copies of objects that can't be
accessed. Alternatively, I wonder if the midx-loading code ought to
check that all of the constituent packs are available.

In that line of thinking, do we even need to delete midx files if one of
their packs goes away? The reading side probably ought to be able to
handle that gracefully.

And the more interesting case is when you repack everything with "-ad"
or similar, at which point you shouldn't even need to look up what's in
the midx to see if you deleted its packs. The point of your operation is
to put it all-into-one, so you know the old midx should be discarded.

> Teach 'git repack' to check for this by loading the MIDX, and checking
> whether the to-be-removed pack is known to the MIDX. This requires a
> slightly odd alternation to a test in t5319, which is explained with a
> comment.

My above musings aside, this seems like an obvious improvement.

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 04c5ceaf7e..98fac03946 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> +	strbuf_addf(&buf, "%s.pack", base_name);
> +	if (m && midx_contains_pack(m, buf.buf))
> +		clear_midx_file(the_repository);
> +	strbuf_insertf(&buf, 0, "%s/", dir_name);

Makes sense. midx_contains_pack() is a binary search, so we'll spend
O(n log n) effort deleting the packs (I wondered if this might be
accidentally quadratic over the number of packs).

And after we clear, "m" will be NULL, so we'll do it at most once. Which
is why you can get rid of the manual "midx_cleared" flag from the
preimage.

So the patch looks good to me.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux