Re: Git packs friendly to block-level deduplication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 24, 2018 at 2:03 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> If you have a bunch of git repositories cloned of the same project on
> the same filesystem, it would be nice of the packs that are produced
> would be friendly to block-level deduplication.
>
> This would save space, and the blocks would be more likely to be in
> cache when you access them, likely speeding up git operations even if
> the packing itself is less efficient.
>
> Here's a hacky one-liner that clones git/git and peff/git (almost the
> same content) and md5sums each 4k packed block, and sort | uniq -c's
> them to see how many are the same:

<snip>

>
> Has anyone here barked up this tree before? Suggestions? Tips on where
> to start hacking the repack code to accomplish this would be most
> welcome.

Does this overlap with the desire to have resumable clones?  I'm
curious what would happen if you did the same experiment with two
separate clones of git/git, cloned one right after the other so that
hopefully the upstream git/git didn't receive any updates between your
two separate clones.  (In other words, how much do packfiles differ in
practice for different packings of the same data?)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux