Nicolas Pitre <nico@xxxxxxx> writes: > On Tue, 17 Mar 2009, Kjetil Barvik wrote: > >> aloha! >> >> Yesterday I run the following command on the updated GIT respository: >> >> git repack -adf --window=250000 --depth=100000 >> >> After 280 minutes or so it finished, but the strange thing was that >> the resulting pack-file was larger than before. I had expected that >> it should be smaler, or at least the same size as before. [snip] >> I can think of one thing which is spesial with the "--depth=100000" >> number, and that is that it is now larger than the total number of >> objects in the pack, which is around 96000 to 97000, or so. > > No, the depth should have zero negative influence on the pack size. > For tight compression, the larger the better. What this will impact > though is runtime access to the pack data afterward. The deeper a > given object is, the slower its access will be. But since the object > recency order tend to put newer objects at the top of a delta chain, > this should impact older objects more than recent ones. I have done some more tests, and have copied the whole git/ directory to a new directory (such that I do not accidentally add or delete any objects/commits), and have made the following table: All pack file sizes, F, below was computed with the following git command: git repack -adf --window=250000 --depth=D D | F | (F - F_prev) / (D - D_prev) -------|------------|---------------------------- 5000 | 19129934 | 10000 | 19128956 | -978 / 5000 = -0.1956 15000 | 19126077 | -2879 / 5000 = -0.5758 20000 | 19126077 | 0 / 5000 = 0 25000 | 19126077 | 0 / 5000 = 0 30000 | 19197575 | 71498 / 5000 = 14.2996 45000 | 19312240 | 114665 / 15000 = 7.6443 60000 | 19560083 | 247843 / 15000 = 16.5229 75000 | 19803043 | 242960 / 15000 = 16.1973 90000 | 19669923 | -133120 / 15000 = -8.8746 95000 | 20463780 | 793857 / 5000 = 155.7714 From the table it seems that you get the smallest pack file (for this particular repository) when --depth value is somewhere between 15000 and 25000. And, when the --depth value was 95000 the resulting pack file was (- 20463780 19126077) = 1 337 703 bytes, 1.25 MiB, or 7% larger than this. > I doubt there is anything to debug. In this case the window size is > used to evaluate a threshold slope for matching objects in the delta > search. What we want is a broader delta tree more than a deep one in > order to have more deltas with a lower depth limit. Therefore a size > threshold is applied, based on the object distance in the delta search > window (see commit c83f032e and the other ones referenced therein). > > By providing a big window value, the threshold slope becomes rather flat > and ineffective, and this changes the delta match outcome. While delta > selection is based on the uncompressed delta result, the compressed size > of different deltas with the same size may vary. I suspect you might > have been unlucky in that regard and this could explain the negative > effect on the pack size. From the table above it seems that I have been unlucky with _all_ --depth values above 25000 or so. Question: is there some low level GIT command I can run to compare 2 pack files to maybe be able to see the reason behind the above table? Maybe to see some details about how many delta's, how big each are, total sizes, etc.. -- kjetil PS! I have the following in my $HOME/.gitconfig file: [repack] UseDeltaBaseOffset = true [gc] auto = 25 autopacklimit = 1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html