On Fri, 24 Feb 2006, Carl Baldwin wrote: > > I meant that there is no benefit in disk space usage. Packing may > actually increase my disk space usage in this case. Refer to what I > said about experimentally running gzip and xdelta on the files > independantly of git. Yes. The deltas tend to compress a lot less well than "normal" files. > I see what you're saying about this data reuse helping to speed up > subsequent cloning operations. However, if packing takes this long and > doesn't give me any disk space savings then I don't want to pay the very > heavy price of packing these files even the first time nor do I want to > pay the price incrementally. I would look at tuning the heuristics in "try_delta()" (pack-objects.c) a bit. That's the place that decides whether to even bother trying to make a delta, and how big a delta is acceptable. For example, looking at them, I already see one bug: .. sizediff = oldsize > size ? oldsize - size : size - oldsize; if (sizediff > size / 8) return -1; .. we really should compare sizediff to the _smaller_ of the two sizes, and skip the delta if the difference in sizes is bound to be bigger than that. However, the "size / 8" thing isn't a very strict limit anyway, so this probably doesn't matter (and I think Nico already removed it as part of his patches: the heuristic can make us avoid some deltas that would be ok). The other thing to look at is "max_size": right now it initializes that to "size / 2 - 20", which just says that we don't ever want a delta that is larger than about half the result (plus the 20 byte overhead for pointing to the thing we delta against). Again, if you feel that normal compression compresses better than half, you could try changing that to .. max_size = size / 4 - 20; .. or something like that instead (but then you need to check that it's still positive - otherwise the comparisons with unsigned later on are screwed up. Right now that value is guaranteed to be positive if only because we already checked .. if (size < 50) return -1; .. earlier). NOTE! Every SINGLE one of those heuristics are just totally made up by yours truly, and have no testing behind them. They're more of the type "that sounds about right" than "this is how it must be". As mentioned, Nico has already been playing with the heuristics - but he wanted better packs, not better CPU usage, so he went the other way from what you would want to try.. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html