Nicolas Pitre <nico@xxxxxxx> writes: > Note, this is a RFC particularly to Junio since the resulting pack is > larger than without the patch with git-repack -a -f. However using a > subsequent git-repack -a brings the pack size down to expected size. So > I'm not sure I've got everything right. I haven't tested it seriously yet, but there is nothing that looks obviously wrong that might cause the inflation problem, from the cursory look after applying the patch on top of your last round. > + if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth) > + return 0; The older code was loosening this check only for a delta chain that is already in pack (which is limited to its previous max_depth). The end result is almost the same -- a thin pack recipient would have deeper delta than it asked. The difference is that the earlier code had implicit 2*max_depth limit, but this one makes the chain length unbounded, which I do not think it is necessarily a bad change. In any case it does not explain why you are getting larger resulting pack, though. > + /* Now some size filtering euristics. */ > + size = trg_entry->size; > if (size < 50) > - return -1; > - if (old_entry->depth >= max_depth) > return 0; This is necessary because you are scanning from smaller to larger, and I think it is a good change. > - /* > - * NOTE! > - * > - * We always delta from the bigger to the smaller, since that's > - * more space-efficient (deletes don't have to say _what_ they > - * delete). > - */ This comment by Linus still applies, even though the scan order is now reversed; no need to remove it. > + > + if (trg_entry->delta) { > + /* > + * The target object already has a delta base but we just > + * found a better one. Remove it from its former base > + * childhood and redetermine the base delta_limit (if used). > + */ And you are making the delta chain unbound for thin case, you can probably omit this with the same if() here; the recomputation seems rather expensive. > + die("object %s inconsistent object length (%lu vs %lu)", > + sha1_to_hex(entry->sha1), size, entry->size); > + if (!size) > + continue; > + delta_index = create_delta_index(n->data, size); > + if (!delta_index) > + die("out of memory"); It might be worth saying "if (size < 50)" here as well; no point wasting the delta window for small sources. > -#if 0 > - /* if we made n a delta, and if n is already at max > - * depth, leaving it in the window is pointless. we > - * should evict it first. > - * ... in theory only; somehow this makes things worse. > - */ > - if (entry->delta && depth <= entry->depth) > - continue; > -#endif I was almost tempted to suggest that the degradation you are seeing might be related to this mystery I did not get around to solve. By allowing to give chance to try delta against less optimum candidates, it appeared that we ended up making the final pack size bigger than otherwise, which suggests that our choice between plain undeltified and a delta half its size might be favoring delta too much. But it does not appear to be related to the inflation you are seeing. With object list taken between v1.2.3..v1.3.0 in git.git repository and without delta reuse, 3054 objects are packed (delta 1734) with this code. The "next" makes 1818 delta (only 5% more), which makes me suspect that it is making a bad choice of delta base, because the final pack size is 1.5M vs 1.9M. The chain length distribution is a bit different (run "git-verify-pack -v" and look at the end of its output). The "next" version: chain length = 1: 257 objects chain length = 2: 189 objects chain length = 3: 156 objects chain length = 4: 149 objects chain length = 5: 113 objects chain length = 6: 105 objects chain length = 7: 105 objects chain length = 8: 102 objects chain length = 9: 103 objects chain length = 10: 539 objects this version: chain length = 1: 415 objects chain length = 2: 333 objects chain length = 3: 259 objects chain length = 4: 197 objects chain length = 5: 155 objects chain length = 6: 134 objects chain length = 7: 106 objects chain length = 8: 69 objects chain length = 9: 47 objects chain length = 10: 19 objects The resulting pack would be faster to access (it has much shorter median chain length). BTW, have you tried it without --no-reuse-pack on an object list that is not thin? It appears you are busting the depth limit. Using the same "git rev-list --objects v1.2.3..v1.3.0" as input, git-pack-objects without --no-reuse-pack gives this distribution: chain length = 1: 364 objects chain length = 2: 269 objects chain length = 3: 198 objects chain length = 4: 164 objects chain length = 5: 148 objects chain length = 6: 123 objects chain length = 7: 122 objects chain length = 8: 103 objects chain length = 9: 92 objects chain length = 10: 234 objects chain length = 11: 12 objects chain length = 12: 1 object chain length = 13: 2 objects So it _might_ be that the depth limiting code is subtly broken which is causing you throw away a perfectly good delta base which in turn results in a bad pack. The distribution from the "next" version looks like this: chain length = 1: 358 objects chain length = 2: 250 objects chain length = 3: 214 objects chain length = 4: 169 objects chain length = 5: 150 objects chain length = 6: 122 objects chain length = 7: 126 objects chain length = 8: 100 objects chain length = 9: 101 objects chain length = 10: 232 objects -- >8 -- Summary of the experiment. # test dataset git rev-list --objects v1.2.3..v1.3.0 >RL-1.2.3--1.3.0 # baseline: "next" version is what is on my $PATH git-pack-objects --no-reuse-delta test-next-pack-nr <RL-1.2.3--1.3.0 git-verify-pack -v test-next-pack-nr-*.pack | tail -n 20 git-pack-objects test-next-pack <RL-1.2.3--1.3.0 git-verify-pack -v test-next-pack-*.pack | tail -n 20 # freshly compiled version with the patch in question ./git-pack-objects --no-reuse-delta test-nico-pack-nr <RL-1.2.3--1.3.0 git-verify-pack -v test-nico-pack-nr-*.pack | tail -n 20 ./git-pack-objects test-nico-pack <RL-1.2.3--1.3.0 git-verify-pack -v test-nico-pack-*.pack | tail -n 20 - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html