Re: Git packs friendly to block-level deduplication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 24 2018, Junio C. Hamano jotted:

> Mike Hommey <mh@xxxxxxxxxxxx> writes:
>
>> FWIW, I sidestep the problem entirely by using alternatives.
>
> That's a funny way to use the word "side-step", I would say, as the
> alternate object store support is there exactly for this use case.

Things you can't do with alternates that block-level de-duplication
gives you:

 1. Your filesystem may be mounted from some NFS host that does
    block-level deduplication internally against other content you don't
    have permission to access, think the /home of a bunch of dev VMs you
    know will have the same repos cloned (along with most of the same FS
    content, e.g. the OS).

    In this case the storage can de-duplicate blocks purely as an
    implementation without git knowing about it, as long as git (or any
    other program using the FS) can be coerced into writing the same
    blocks other gits on other machines write, at least most of the
    time.

 2. Ditto NFS, but e.g. chroot'd /home on a local non-NFS.

 3. Even if the repos are all on the same host they may just be ad-hoc
    cloned in /home by different users, it's easy to write something in
    /etc/gitconfig to give them the same repack settings, less so to
    maintain some git-clone wrapper that implictily adds --reference
    (they'll not know, or forget) to all clones, or goes hunting around
    for checkouts and adding alternates after the fact.

 4. With alternates you always need to maintain some blessed "clone from
    this" repo that can't go away least everything cloned from it become
    corrupt and needs manual repair. If you're aiming to just save
    storage block-level deduplication may be a better trade-off.

Also once you clone with --reference doesn't the local clone only add
new objects as you "git fetch", never pruning those if the same objects
appear in the alternate later on, or am I misremembering things?

I mainly have use-case #1 & #3, although they could both be made to use
alternates with some hassle (e.g. for #1 exposing a separate read-only
copy of "these are alternates" to each VM) it seemed worthwhile to see
if repack could be made to be more block-level deduplication friendly,
as deploying that is easier.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux