On Tue, Jul 24, 2018 at 10:20:05AM -0700, Stefan Beller wrote: > So in my understanding we have a "common base pack" and specific > packs on top for each "island". Sort of. This is another hacky part. The islands themselves are generally just about forbidding deltas, and not any particular kind of layering. But there's some magic layering only for the "core" island, which gets to go first (and makes a sort of pseudo-pack at the front of the one pack). And then everything else is written willy nilly. This is a hack to try to make the "blit the pack bytes out" code path for cloning fast. And that has to pick _one_ winner, so ideally you'd point it at the thing that gets cloned the most, and everybody else gets to be a loser. Again, this was designed for the current pack-reuse code we have upstream, which we (GitHub) found to be pretty crappy (which I feel justified in saying as one of the authors). I need to clean up and share the alternative strategy we ended up with. > Do you envision to have "groups of islands" (an atoll) for say all > open source clones of linux.git, such that you can layer the packs? > You would not just have the base pack + island pack, but have one > pack that is common to most islands? So no, we don't really layer in any sane way. If pack-objects were fed the topological relationships between the forks, in theory we could create a layered packfile that respects that. But even that is not quite enough. At the time of forking, you might imagine that torvalds/linux has the base pack, and then somebody forks from them and contains all of those objects plus more, and somebody forks from them, and so on. But that's just a snapshot. Later torvalds/linux will get a bunch of new objects pushed to it. And some of its forks will merge those objects, too. But some of them will just rot, abandoned, as nobody ever touches them again. So I don't think there's much to be gained by paying attention to the external forking relationships. We have to discover afresh the relationships between objects, and which refs (and thus which islands) point to them. One thing I don't think we ever tried was doubling down on the islandCore concept and making the "root" fork as tightly packed as it could be (with the assumption that _most_ people grab that). And then just respect the islands for all the other objects (remember this is an optimization, so the worst case is somebody asks for an object during a fetch and we have to throw away its on-disk delta). That would solve the problem that fetching torvalds/linux from GitHub yields a bigger pack than fetching it from kernel.org. But again, it's making that root fork weirdly magical. People who fetch solely from other forks won't get any benefit (and may even see worse packs). -Peff