Re: git repack command on larger pack file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 27, 2015 at 02:04:23AM +0000, Sivakumar Selvam wrote:

>    When I finished git repacking, I found 12 pack files with each 4 GB and
> the total size is 48 GB. Again I ran the same git repack command by just
> removing only --max-pack-size= parameter, the size of the single pack file
> is 66 GB.
> 
> git repack -A -b -d -q --depth=50 --window=10 abc.git
> 
> Now, I see the total size of the single abc.git has become 66 GB. Initially
> it was 34 GB, After using  --max-pack-size=4g it become 48 GB. When we
> remove the --max-pack-size=4g parameter and tried to create a single pack
> file now it become 66 GB.
>    
> Looks like once we do git repack with multiple pack files, we can't revert
> back to the original size.

Git tries to take some shortcuts when repacking: if two objects are in
the same pack but not deltas, it will not consider making deltas out of
them. The logic is we would already have tried that while making the
original pack. But of course when you are doing weird things with the
packing parameters, that is not always a good assumption.

When doing experiments like this, add "-f" to your repack command-line
to avoid reusing deltas. The result should be much smaller (at the
expense of more CPU time to do the repack).

I'd also recommend increasing "--window" if you can afford the extra CPU
during the repack. It can often produce smaller packs. And it has less
cost than you might think (e.g., window=20 is not twice as expensive as
window=10, because the work to access the objects is cached).  You can
also increase --depth, but I have never found it to be particularly
helpful for decreasing size[1].

-Peff

[1] This is all theory, and I don't know how well git actually finds
    such deltas, but it is probably better to have a dense tree of
    deltas rather than long chains.  If you have a chain of N objects
    and would to add object N+1 to it, you are probably not much worse
    off to base it on object N-1, creating a "fork" at N. The resulting
    objects should be less expensive to access for subsequent operations
    (as any time you want the Nth object, you have to resolve all parts
    of the chain, so shorter chains are better, and you the delta cache
    is more likely to get a hit on that N-1 object).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]