On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote: > We could get away "git diff --exit-code $1 $2" to do a single process > invocation (rather than two rev-parses), but I don't know if it is worth > the complexity. And here's that patch. I'm actually a little iffy on it because it switches to "diff-tree" from a raw-sha1 comparison. For a well-formed repo, that shouldn't matter. But what if you had a commit that was replacing a malformed tree object, but not otherwise changing the diff? We might drop it as "empty", even though you'd prefer to keep it. I guess some alternative ways of doing the same thing are: 1. Use "git rev-parse $1 $3^{tree}", parse the output and compare. Surprisingly tedious to do in a shell without invoking any extra sub-processes. 2. When we do the initial rev-list, output the tree, too. Then we get the old tree sha1 for "free" (no extra process, but at the cost of slightly more I/O by the shell). I suspect we could also cache the tree of the parent and avoid the rev-parse in git_commit_non_empty_tree(), too. I dunno. I'm inclined to say that none of this is worth it to try to drop one or two processes. Writing filter-branch in a better language (or just using BFG) would probably be a more productive use of time. 20% looks like a lot, but that's because it's pretty fast in the first place. The timings I showed earlier (and below) are for git.git, which is not that huge. But the savings from 348d4f2 are really about avoiding looking at the trees entirely; the bigger your tree, the more you save. Running it on linux.git should show that we're still reclaiming most of the original optimization. So I'm inclined to go with the "conservative" fix I sent initially, and drop this micro-optimization. -- >8 -- Subject: [PATCH] filter-branch: optimize out rev-parse for unchanged tree The prior commit noticed that we started feeding "$commit^{tree}" to git_commit_non_empty_tree instead of its resolved 40-hex sha1. It made the conservative fix of resolving and passing the 40-hex sha1, even though it does add some overhead. This patch takes the less conservative choice, under the assumption that we are unlikely to break anybody's commit-filter by showing them the unresolved name "$commit^{tree}" (which is probably the case, because most people would end their filter with "git commit-tree" or "git_commit_non_empty_tree"). We can make the comparison in the latter function more clever by using "diff-tree --quiet", which will compare the two trees with only a single process invocation. Here are the resulting numbers from p7000: Test 348d4f2^ 348d4f2 HEAD^ HEAD ---------------------------------------------------------------------------------------------------------------- 7000.2: noop filter 9.48(4.29+0.41) 3.73(0.22+0.26) -60.7% 4.56(0.27+0.26) -51.9% 3.74(0.25+0.25) -60.5% Signed-off-by: Jeff King <peff@xxxxxxxx> --- git-filter-branch.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 5e094ce..d1879e3 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -44,7 +44,7 @@ skip_commit() # it will skip commits that leave the tree untouched, commit the other. git_commit_non_empty_tree() { - if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then + if test $# = 3 && git diff-tree --quiet "$1" "$3"; then map "$3" else git commit-tree "$@" @@ -404,7 +404,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree=$(git rev-parse "$commit^{tree}") + tree="$commit^{tree}" fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || -- 2.7.0.248.g5eafd77 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html