On Fri, Dec 14 2018, Derrick Stolee via GitGitGadget wrote: > Despite these potential drawbacks, the benefits of the algorithm > are clear. By adding a counter to 'add_children_by_path' and > 'mark_tree_contents_uninteresting', I measured the number of > parsed trees for the two algorithms in a variety of repos. Ah, so we take one or the other. I tested this with: diff --git a/revision.c b/revision.c index 63bf6230dc..e9c67aa550 100644 --- a/revision.c +++ b/revision.c @@ -61,6 +61,8 @@ static void mark_tree_contents_uninteresting(struct repository *r, struct tree_desc desc; struct name_entry entry; + fprintf(stderr, "MTCU\n"); + if (parse_tree_gently(tree, 1) < 0) return; @@ -166,6 +168,8 @@ static void add_children_by_path(struct repository *r, struct tree_desc desc; struct name_entry entry; + fprintf(stderr, "ACBP\n"); + if (!tree) return; And tried testing my pushing this branch to my git.git: $ for v in true false; do git push --delete avar push/sparse; ./git -c pack.usesparse=$v --exec-path=$PWD push avar HEAD 2>&1 | grep -e MTCU -e ACBP | uniq -c; done To github.com:avar/git.git - [deleted] push/sparse 22 ACBP To github.com:avar/git.git - [deleted] push/sparse 184 MTCU But snipping around a bit from the commit message: >When enumerating objects to place in a pack-file during 'git >pack-objects --revs', we discover the "frontier" of commits >that we care about and the boundary with commit we find >uninteresting. From that point, we walk trees to discover which >trees and blobs are uninteresting. Finally, we walk trees from the >interesting commits to find the interesting objects that are >placed in the pack. >[...] >These improvements will have even larger benefits in the super- >large Windows repository. In our experiments, we see the >"Enumerate objects" phase of pack-objects taking 60-80% of the >end-to-end time of non-trivial pushes, taking longer than the >network time to send the pack and the server time to verify the >pack. If I change my monkeypatch to: diff --git a/revision.c b/revision.c index 63bf6230dc..9171ca27c5 100644 --- a/revision.c +++ b/revision.c @@ -63,0 +64,3 @@ static void mark_tree_contents_uninteresting(struct repository *r, + fprintf(stderr, "MTCU\n"); + sleep(1); + @@ -168,0 +172,3 @@ static void add_children_by_path(struct repository *r, + fprintf(stderr, "ACBP\n"); + sleep(1); + We spend a long time printing those out before we ever get to "Enumerating objects". Which was where I was trying to test this, i.e. is this a lot of work we perform before we print out the progress bar, and regardless of this optimization should have other progress output there, so we can see this time we're spending on this?