Re: Mozilla .git tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 29 Aug 2006, Shawn Pearce wrote:
> Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> > The git tools can be modified to set the compression level to 0 before
> > compressing tree deltas. There is no need to change the decoding code.
> > Even with compression level 0 they still get slightly larger because
> > zlib tacks on a header.
> 
> See my followup email to myself; I think we're talking a zlib
> overhead of 9.2 bytes on average per tree delta.  That's with a
> compression level of -1 (default, which is 6).

In fact, the bulk of a tree delta is most likely to contain the 
literal sha1 of one or more directory entries that changed, and this is 
hardly compressible.  There is nothing to gain by forcing zlib level to 
0 for tree deltas since it never makes the deflated stream smaller from 
the tests I've performed in the past.  It seems that zlib is smart 
enough not to attempt any compression when there is no gain.  That 
leaves the zlib header as the only overhead.

And the zlib header contains a CRC which we're about to use for 
validating the data when doing delta data reuse in order to prevent pack 
corruption propagation like the one recently posted on the list.  
Without that a pack corruption (from a bad disk sector for example) is 
likely to go unnoticed when doing a repack.  The data could be validated 
by expanding deltas and verifying the sha1 on the end result but this is 
a really expensive operation if performed on all deltas which is best 
left to git-fsck-objects --full. So I think the small overhead relative 
to total pack size might be worth it for better data integrity.

Using an offset instead of a sha1 to reference a delta base object is 
certainly a good idea though.  But I'd use the same variable encoding as 
the object size to avoid the 32-bit limit issue.  When generating a thin 
pack the real sha1 of the delta object could be substituted for the 
offset quite easily if the base object is not sent a part of the same 
pack.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]