Re: How to speedup git clone for big binary files (disable delta compression)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 19, 2018 at 12:05:00AM +0200, René Scheibe wrote:

> Code:
> ---------------------------------------------------------------------
> #!/bin/bash
> 
> # setup repository
> git init --quiet repo
> cd repo
> 
> echo '*.bin binary -delta' > .gitattributes
> git add .gitattributes
> git commit --quiet -m 'attributes'
> 
> for i in $(seq 10); do
>     dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
>     git add data.bin
>     git commit --quiet -m "data $i"
> done
> cd ..
> 
> # create clone repository
> time git clone --no-local repo clone

This clone won't respect those attributes, because we don't dig into
in-repo attributes. There's actually some inconsistency in how Git
handles attribute locations. Usually they're just read from the top of
the working tree, but in some instances we read them from the tree
itself (e.g., git-archive respects some attributes from the tree it's
archiving).

If you do:

  echo "*.bin binary -delta" >repo/.git/info/attributes

then that does work (we always respect repo-level attributes like that).

> # repack original repository
> cd repo
> time git repack -a -d

In this case we're reading the attributes from the working tree, and it
does work. In theory the clone case could do so, too, but git-upload-pack,
the server side of the clone, avoids looking at the working tree at all.
That's something we _could_ address, but it doesn't really fix the
general case, since most clones will be from a bare repository anyway.

So in summary:

  1. Depending on what you're trying to do, the .git/info/attributes
     trick might be enough for you.

  2. I do think it would be nice for more places to respect attributes
     from in trees. There's a question of which tree, but I think in
     general reading them from HEAD in a bare repository would do what
     people want (it's a little funny if you're fetching branch "foo",
     but HEAD points to "bar", but it's at least consistent with the
     non-bare case). There's some prior art in the way we treat mailmaps
     (in a bare repo, we read HEAD:.mailmap).

     I suspect the patch may not be trivial, as I don't know how ready
     the attributes code is to handle in-tree lookups (remember that it
     is not just HEAD:.gitattributes we must care about, but other files
     sprinkled through the repository, like "HEAD:subdir/.gitattributes".

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux