Jeff King <peff@xxxxxxxx> writes: > Since 898b14c (pack-objects: rework check_delta_limit usage, > 2007-04-16), we check the delta depth limit only when > figuring out whether we should make a new delta. We don't > consider it at all when reusing deltas, which means that > packing once with --depth=250, and then again with > --depth=50, the second pack my still contain chains larger > than 50. "may still contain", methinks. > ... > > This patch bounds the length of delta chains in the output > pack based on --depth, regardless of whether they are caused > by cross-pack deltas or existed in the input packs. This > fixes the problem, but does have two possible downsides: > > 1. High-depth aggressive repacks followed by "normal" > repacks will throw away the high-depth chains. I actually think it is a feature that the normal one that runs later is allowed to fix an over-deep delta chain. > 2. If you really do want to store high-depth deltas on > disk, they may be discarded and new delta computed when > serving a fetch, unless you set pack.depth to match > your high-depth size. Likewise. > ... But we may > visit an object through multiple delta paths, ... Yes, good thinking. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 8841f8b36..2b08c8121 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1539,6 +1539,8 @@ static int pack_offset_sort(const void *_a, const void *_b) > * 2. Updating our size/type to the non-delta representation. These were > * either not recorded initially (size) or overwritten with the delta type > * (type) when check_object() decided to reuse the delta. > + * > + * 3. Resetting our delta depth, as we are now a base object. > */ > static void drop_reused_delta(struct object_entry *entry) > { > @@ -1552,6 +1554,7 @@ static void drop_reused_delta(struct object_entry *entry) > p = &(*p)->delta_sibling; > } > entry->delta = NULL; > + entry->depth = 0; > > oi.sizep = &entry->size; > oi.typep = &entry->type; Makes sense. > static void break_delta_chains(struct object_entry *entry) > { > @@ -1587,6 +1593,18 @@ static void break_delta_chains(struct object_entry *entry) > */ > entry->dfs_state = DFS_ACTIVE; > break_delta_chains(entry->delta); > + > + /* > + * Once we've recursed, our base (if we still have one) knows > + * its depth, so we can compute ours (and check it against > + * the limit). > + */ > + if (entry->delta) { > + entry->depth = entry->delta->depth + 1; > + if (entry->depth > depth) > + drop_reused_delta(entry); > + } ;-) Surprisingly simpple and effective. > diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh > new file mode 100755 > index 000000000..236d60fe6 > --- /dev/null > +++ b/t/t5316-pack-delta-depth.sh > @@ -0,0 +1,93 @@ > +#!/bin/sh > + > +test_description='pack-objects breaks long cross-pack delta chains' > +. ./test-lib.sh > + > +# This mirrors a repeated push setup: > +# > +# 1. A client repeatedly modifies some files, makes a > +# commit, and pushes the result. It does this N times > +# before we get around to repacking. > +# > +# 2. Each push generates a thin pack with the new version of > +# various objects. Let's consider some file in the root tree > +# which is updated in each commit. > +# > +# When generating push number X, we feed commit X-1 (and > +# thus blob X-1) as a preferred base. The resulting pack has > +# blob X as a thin delta against blob X-1. > +# > +# On the receiving end, "index-pack --fix-thin" will > +# complete the pack with a base copy of tree X-1. blob? tree? I think the argument would work the same way for either type of objects, but the previous paragraph is using blob as the example, so s/tree/blob/ here? > +# 3. In older versions of git, if we used the delta from > +# pack X, then we'd always find blob X-1 as a base in the > +# same pack (and generate a fresh delta). > +# > +# But with the pack mru, we jump from delta to delta > +# following the traversal order: > +# ... > +# Now we have blob X as a delta against X-1, which is a delta > +# against X-2, and so forth. > +# Note that this whole setup is pretty reliant on the current > +# packing heuristics. We double-check that our test case > +# actually produces a long chain. If it doesn't, it should be > +# adjusted (or scrapped if the heuristics have become too unreliable) IOW, we want something that says "we first check X and if X still holds, then we expect Y to also hold; if X no longer hold, do not bother to test that Y holds". Nice food for thought. Perhaps we want a way to express that in our test framework, or is X in the above merely another way to say "prerequisite"? > +test_expect_success 'packing produces a long delta' ' > + # Use --window=0 to make sure we are seeing reused deltas, > + # not computing a new long chain. > + pack=$(git pack-objects --all --window=0 </dev/null pack) && > + test 9 = "$(max_chain pack-$pack.pack)" > +' > + > +test_expect_success '--depth limits depth' ' > + pack=$(git pack-objects --all --depth=5 </dev/null pack) && > + test 5 = "$(max_chain pack-$pack.pack)" > +' > + > +test_done