Re: git repack: --depth=100000 causing larger not smaler pack file?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Mar 2009, Kjetil Barvik wrote:

>   aloha!
> 
>   Yesterday I run the following command on the updated GIT respository:
> 
>     git repack -adf --window=250000 --depth=100000
> 
>   After 280 minutes or so it finished, but the strange thing was that
>   the resulting pack-file was larger than before.  I had expected that
>   it should be smaler, or at least the same size as before.
> 
>   kjetil git (my_next)$ ls -l .git/objects/pack/*
> -r-------- 1 kjetil kjetil  2757280 2009-03-16 15:18 .git/objects/pack/pack-c5f15d5c48d6b3902a49046d7e8a8d717e167051.idx
> -r-------- 1 kjetil kjetil 19961120 2009-03-16 15:18 .git/objects/pack/pack-c5f15d5c48d6b3902a49046d7e8a8d717e167051.pack
> 
>   Before I started the pack file was around 19 250 000 bytes, and was
>   the result of the following commands:
> 
>   1) git repack -adf --window=250000 --depth=20000
>           - not completly sure about the --window number here
>           - the resulting pack file was a litle less than 19 100 000
> 
>   2) 'git fetch' to get the latest GIT patches
> 
>   3) since 'git fetch' always make an extra new "smal" pack file, I run
>      the command 'git repack -ad --window=40000 --depth=10000' to be
>      able to get one singel pack file of 19 250 000 bytes or so.
> 
>   I can think of one thing which is spesial with the "--depth=100000"
>   number, and that is that it is now larger than the total number of
>   objects in the pack, which is around 96000 to 97000, or so.

No, the depth should have zero negative influence on the pack size.  
For tight compression, the larger the better.  What this will impact 
though is runtime access to the pack data afterward.  The deeper a 
given object is, the slower its access will be.  But since the object 
recency order tend to put newer objects at the top of a delta chain, 
this should impact older objects more than recent ones.

>   I have run 'git fsck --strict --full' on the pack with no resulting
>   error/debug output or change in the file size.

There shouldn't be any.

>   Any help on how to debug this?

I doubt there is anything to debug.  In this case the window size is 
used to evaluate a threshold slope for matching objects in the delta 
search.  What we want is a broader delta tree more than a deep one in 
order to have more deltas with a lower depth limit.  Therefore a size 
threshold is applied, based on the object distance in the delta search 
window (see commit c83f032e and the other ones referenced therein).

By providing a big window value, the threshold slope becomes rather flat 
and ineffective, and this changes the delta match outcome.  While delta 
selection is based on the uncompressed delta result, the compressed size 
of different deltas with the same size may vary.  I suspect you might 
have been unlucky in that regard and this could explain the negative 
effect on the pack size.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux