On Wed, Jan 24, 2018 at 2:03 PM, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > If you have a bunch of git repositories cloned of the same project on > the same filesystem, it would be nice of the packs that are produced > would be friendly to block-level deduplication. > > This would save space, and the blocks would be more likely to be in > cache when you access them, likely speeding up git operations even if > the packing itself is less efficient. > > Here's a hacky one-liner that clones git/git and peff/git (almost the > same content) and md5sums each 4k packed block, and sort | uniq -c's > them to see how many are the same: <snip> > > Has anyone here barked up this tree before? Suggestions? Tips on where > to start hacking the repack code to accomplish this would be most > welcome. Does this overlap with the desire to have resumable clones? I'm curious what would happen if you did the same experiment with two separate clones of git/git, cloned one right after the other so that hopefully the upstream git/git didn't receive any updates between your two separate clones. (In other words, how much do packfiles differ in practice for different packings of the same data?)