Re: [PATCH] diff-delta: produce optimal pack data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 24 Feb 2006, Carl Baldwin wrote:
> 
> I meant that there is no benefit in disk space usage.  Packing may
> actually increase my disk space usage in this case.  Refer to what I
> said about experimentally running gzip and xdelta on the files
> independantly of git.

Yes. The deltas tend to compress a lot less well than "normal" files.

> I see what you're saying about this data reuse helping to speed up
> subsequent cloning operations.  However, if packing takes this long and
> doesn't give me any disk space savings then I don't want to pay the very
> heavy price of packing these files even the first time nor do I want to
> pay the price incrementally.

I would look at tuning the heuristics in "try_delta()" (pack-objects.c) a 
bit. That's the place that decides whether to even bother trying to make a 
delta, and how big a delta is acceptable. For example, looking at them, I 
already see one bug:

	..
        sizediff = oldsize > size ? oldsize - size : size - oldsize;
        if (sizediff > size / 8)
                return -1;
	..

we really should compare sizediff to the _smaller_ of the two sizes, and 
skip the delta if the difference in sizes is bound to be bigger than that.

However, the "size / 8" thing isn't a very strict limit anyway, so this 
probably doesn't matter (and I think Nico already removed it as part of 
his patches: the heuristic can make us avoid some deltas that would be 
ok).

The other thing to look at is "max_size": right now it initializes that to 
"size / 2 - 20", which just says that we don't ever want a delta that is 
larger than about half the result (plus the 20 byte overhead for pointing 
to the thing we delta against). Again, if you feel that normal compression 
compresses better than half, you could try changing that to

	..
	max_size = size / 4 - 20;
	..

or something like that instead (but then you need to check that it's still 
positive - otherwise the comparisons with unsigned later on are screwed 
up. Right now that value is guaranteed to be positive if only because we 
already checked

	..
	if (size < 50)
		return -1;
	..

earlier).

NOTE! Every SINGLE one of those heuristics are just totally made up by 
yours truly, and have no testing behind them. They're more of the type 
"that sounds about right" than "this is how it must be". As mentioned, 
Nico has already been playing with the heuristics - but he wanted better 
packs, not better CPU usage, so he went the other way from what you would 
want to try..

		Linus

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]