On Mon, Dec 11, 2023 at 09:18:32AM +0100, Patrick Steinhardt wrote: > > If you have the bulk of your repositories in the alternate, then I think > > you might want to consider how we combine the two. > > Yes, in the general case the bulk of objects is indeed contained in the > alternate. I thought about this for a while this morning, and think that while it is possible, I'm not sure I can think of a compelling use-case where you'd want to reuse objects from packs across an alternate boundary. On the "I think it's possible front": The challenge is making sure that the set of disjoint packs among each object store is globally disjoint in one direction along the alternate path. I think the rule would require you to honor the disjointed-ness of any packs in alternate(s) you might have when constructing new disjoint packs. So if repository fork.git is an alternate of network.git (and both had long-lived MIDXs), network.git is free to make any set of disjoint packs it chooses, and fork.git can only create disjoint packs which are disjoint with respect to (a) the other disjoint packs in fork.git (if any), and (b) the disjoint packs in network.git (and recursively for any repositories that network.git is an alternate of in the general case). Now on the "I can't think of a compelling use-case front": I think the only reason you'd want to be able to reuse objects from MIDXs across the alternates boundary is if you have MIDX bitmaps in both the repository and its alternate. Indeed, the only time that we kick in pack-reuse in general is when we have a bitmap, so in order to reuse objects from both the repo and its alternate, you'd have to have a bitmap in both repositories. But having a MIDX bitmap means that any packs in the MIDX for which you're generating a bitmap have to be closed over object reachability. So unless the repository and its alternate have totally distinct lines of history (in which case, I'm not sure you would want to share objects between the two in the first place), any pack you bitmap in the child repository fundamentally couldn't be disjoint with respect to its parent. This is because if it were to be disjoint, it would have to be repacked with '-l' (or some equivalent '-l'-like flag that only ignores non-local packs which are marked as disjoint). But if you exclude those objects and any one (or more) of them is reachable from some object(s) you didn't exclude, you wouldn't be able to generate a bitmap in the first place. It's very possible that there's something about your setup that I'm not fully grokking, but I don't think in general this is something that we'd want to do (even if it is theoretically possible). > > Whether or not this is a feature that you/others need, I definitely > > think we should leave it out of this series, since I am (a) fairly > > certain that this is possible to do, and (b) already feel like this > > series on its own is complicated enough. > > I'm perfectly fine if we say that the benefits of your patch series > cannot yet be applied to repositories with alternates. But from my point > of view it's a requirement that this patch series does not silently > break this usecase due to Git starting to generate disjointed packs > where it cannot ensure that the disjointedness property holds. I think one thing you could reasonably do is use *only* the non-local MIDX bitmaps when doing pack reuse. Currently we'll use the first MIDX we find, which is guaranteed to be the local one, if it exists. This was the case before this series, and this series does not change that behavior. Unless you had a pack bitmap in the alternated repository (which I think is unlikely, since it would require a full reachability closure, thus eliminating any de-duplication benefits you'd otherwise get when using alternates), you'd be find before and after this series. > As I haven't yet read through this whole patch series, the question is > boils down to whether the end result is opt-in or opt-out. If it was > opt-out then I could see the above usecase breaking silently. If it was > opt-in then things should be fine and we can address this ommission in a > follow up patch series. We at GitLab would definitely be interested in > helping out with this given that it directly affects us and that the > demonstrated savings seem very promising. The end result is opt-in, you have to change the `pack.allowPackReuse` configuration from its default value of "true" (or the alternate spelling taught in this series, "single") to "multi" in order to enable the new behavior. Thanks, Taylor