Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 11, 2023 at 09:18:32AM +0100, Patrick Steinhardt wrote:
> > If you have the bulk of your repositories in the alternate, then I think
> > you might want to consider how we combine the two.
>
> Yes, in the general case the bulk of objects is indeed contained in the
> alternate.

I thought about this for a while this morning, and think that while it
is possible, I'm not sure I can think of a compelling use-case where
you'd want to reuse objects from packs across an alternate boundary.

On the "I think it's possible front":

The challenge is making sure that the set of disjoint packs among each
object store is globally disjoint in one direction along the alternate
path. I think the rule would require you to honor the disjointed-ness of
any packs in alternate(s) you might have when constructing new disjoint
packs.

So if repository fork.git is an alternate of network.git (and both had
long-lived MIDXs), network.git is free to make any set of disjoint packs
it chooses, and fork.git can only create disjoint packs which are
disjoint with respect to (a) the other disjoint packs in fork.git (if
any), and (b) the disjoint packs in network.git (and recursively for any
repositories that network.git is an alternate of in the general case).

Now on the "I can't think of a compelling use-case front":

I think the only reason you'd want to be able to reuse objects from
MIDXs across the alternates boundary is if you have MIDX bitmaps in both
the repository and its alternate. Indeed, the only time that we kick in
pack-reuse in general is when we have a bitmap, so in order to reuse
objects from both the repo and its alternate, you'd have to have a
bitmap in both repositories.

But having a MIDX bitmap means that any packs in the MIDX for which
you're generating a bitmap have to be closed over object reachability.
So unless the repository and its alternate have totally distinct lines
of history (in which case, I'm not sure you would want to share objects
between the two in the first place), any pack you bitmap in the child
repository fundamentally couldn't be disjoint with respect to its
parent.

This is because if it were to be disjoint, it would have to be repacked
with '-l' (or some equivalent '-l'-like flag that only ignores non-local
packs which are marked as disjoint). But if you exclude those objects
and any one (or more) of them is reachable from some object(s) you
didn't exclude, you wouldn't be able to generate a bitmap in the first
place.

It's very possible that there's something about your setup that I'm not
fully grokking, but I don't think in general this is something that we'd
want to do (even if it is theoretically possible).

> > Whether or not this is a feature that you/others need, I definitely
> > think we should leave it out of this series, since I am (a) fairly
> > certain that this is possible to do, and (b) already feel like this
> > series on its own is complicated enough.
>
> I'm perfectly fine if we say that the benefits of your patch series
> cannot yet be applied to repositories with alternates. But from my point
> of view it's a requirement that this patch series does not silently
> break this usecase due to Git starting to generate disjointed packs
> where it cannot ensure that the disjointedness property holds.

I think one thing you could reasonably do is use *only* the non-local
MIDX bitmaps when doing pack reuse.

Currently we'll use the first MIDX we find, which is guaranteed to be
the local one, if it exists. This was the case before this series, and
this series does not change that behavior. Unless you had a pack bitmap
in the alternated repository (which I think is unlikely, since it would
require a full reachability closure, thus eliminating any de-duplication
benefits you'd otherwise get when using alternates), you'd be find
before and after this series.

> As I haven't yet read through this whole patch series, the question is
> boils down to whether the end result is opt-in or opt-out. If it was
> opt-out then I could see the above usecase breaking silently. If it was
> opt-in then things should be fine and we can address this ommission in a
> follow up patch series. We at GitLab would definitely be interested in
> helping out with this given that it directly affects us and that the
> demonstrated savings seem very promising.

The end result is opt-in, you have to change the `pack.allowPackReuse`
configuration from its default value of "true" (or the alternate
spelling taught in this series, "single") to "multi" in order to enable
the new behavior.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux