Re: [RFC PATCH 3/5] pack-objects: add delta-islands support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 24, 2018 at 10:20:05AM -0700, Stefan Beller wrote:

> So in my understanding we have a "common base pack" and specific
> packs on top for each "island".

Sort of. This is another hacky part. The islands themselves are
generally just about forbidding deltas, and not any particular kind of
layering.

But there's some magic layering only for the "core" island, which gets
to go first (and makes a sort of pseudo-pack at the front of the one
pack). And then everything else is written willy nilly. This is a hack
to try to make the "blit the pack bytes out" code path for cloning fast.
And that has to pick _one_ winner, so ideally you'd point it at the
thing that gets cloned the most, and everybody else gets to be a loser.

Again, this was designed for the current pack-reuse code we have
upstream, which we (GitHub) found to be pretty crappy (which I feel
justified in saying as one of the authors). I need to clean up and share
the alternative strategy we ended up with.

> Do you envision to have "groups of islands" (an atoll) for say all
> open source clones of linux.git, such that you can layer the packs?
> You would not just have the base pack + island pack, but have one
> pack that is common to most islands?

So no, we don't really layer in any sane way. If pack-objects were fed
the topological relationships between the forks, in theory we could
create a layered packfile that respects that.

But even that is not quite enough. At the time of forking, you might
imagine that torvalds/linux has the base pack, and then somebody forks
from them and contains all of those objects plus more, and somebody
forks from them, and so on. But that's just a snapshot. Later
torvalds/linux will get a bunch of new objects pushed to it. And some of
its forks will merge those objects, too. But some of them will just rot,
abandoned, as nobody ever touches them again.

So I don't think there's much to be gained by paying attention to the
external forking relationships. We have to discover afresh the
relationships between objects, and which refs (and thus which islands)
point to them.

One thing I don't think we ever tried was doubling down on the
islandCore concept and making the "root" fork as tightly packed as it
could be (with the assumption that _most_ people grab that). And then
just respect the islands for all the other objects (remember this is an
optimization, so the worst case is somebody asks for an object during a
fetch and we have to throw away its on-disk delta).

That would solve the problem that fetching torvalds/linux from GitHub
yields a bigger pack than fetching it from kernel.org. But again, it's
making that root fork weirdly magical. People who fetch solely from
other forks won't get any benefit (and may even see worse packs).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux