Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 08, 2023 at 09:19:25AM +0100, Patrick Steinhardt wrote:
> > Writing this out, I think that you could make an argument that
> > `--exclude-disjoint` is a better name for the last option. So I'm
> > definitely open to suggestions here, but I don't want to get too bogged
> > down on command-line option naming (so long as we're all reasonably
> > happy with the result).
>
> Yeah, as said, I don't mind it too much. It's a complex area and the
> flags all do different things, so it's expected that you may have to do
> some research on what exactly they do. That being said, I do like your
> proposed `--exclude-disjoint` a lot more than `--ignore-disjoint`.

I think that's fair, I renamed the option to be "--exclude-disjoint"
instead of "--ignore-disjoint" for any subsequent round(s) of this
series.

> > > One thing I wondered: do we need to consider the `-l` flag? When using
> > > an alternate object directory it is totally feasible that the alternate
> > > may be creating new disjoint packages without us knowing, and thus we
> > > may not be able to guarantee the disjoint property anymore.
> >
> > I don't think so. We'd only care about one direction of this (that
> > alternates do not create disjoint packs which overlap with ours, instead
> > of the other way around), but since we don't put non-local packs in the
> > MIDX, I think we're OK.
> >
> > I suppose that you might run into trouble if you use the chained MIDX
> > thing (via its `->next` pointer). I haven't used that feature myself, so
> > I'd have to play around with it.
>
> We do use this feature at GitLab for forks, where forks connect to a
> common alternate object directory to deduplicate objects. As both the
> fork repository and the alternate object directory use an MIDX I think
> they would be set up exactly like that.

Yep, that's right. I wasn't sure whether or not this feature had been
used extensively in production or not (we don't use it at GitHub, since
objects only live in their fork repositories for a short while before
moving to the fork network repository).

> I guess the only really viable solution here is to ignore disjoint packs
> in the main repo that connects to the alternate in the case where the
> alternate has any disjoint packs itself.

I think the behavior you'd get here is that we'd only look for disjoint
packs in the first MIDX in the chain (which is always the local one),
and we'd only recognizes packs from that MIDX as being potentially
disjoint.

If you have the bulk of your repositories in the alternate, then I think
you might want to consider how we combine the two. My sense is that
you'd want to be disjoint with respect to anything downstream of you.

Whether or not this is a feature that you/others need, I definitely
think we should leave it out of this series, since I am (a) fairly
certain that this is possible to do, and (b) already feel like this
series on its own is complicated enough.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux