On Tue, Apr 28, 2015 at 10:39:44PM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > I'm not sure of a solution short of replacing the use of sed here with > > something else. perl would be a simple choice, but filter-branch does > > not otherwise depend on it. We could use a shell "read" loop, but those > > are quite slow (and filter-branch is slow enough as it is!). > > You need to only skip the header part, right? > I would imagine that > > ( > while read x && test -n "$x" > do > :; > done > cat > ) <../commit | eval "$filter_msg" > > would not spin too much in shell loop, perhaps? Yeah, that is not too bad. Probably we want "read -r", just in case of weirdness in the header lines (and that's in POSIX, and we use it in other scripts, so it should be portable enough). And we can save a subshell if we don't mind the potential variable-name conflict. Here's what I came up with. -- >8 -- Subject: filter-branch: avoid passing commit message through sed On some systems (like OS X), if sed encounters input without a trailing newline, it will silently add it. As a result, "git filter-branch" on such systems may silently rewrite commit messages that omit a trailing newline. Even though this is not something we generate ourselves with "git commit", it's better for filter-branch to preserve the original data as closely as possible. We're using sed here only to strip the header fields from the commit object. We can accomplish the same thing with a shell loop. Since shell "read" calls are slow (usually one syscall per byte), we use "cat" once we've skipped past the header. Depending on the size of your commit messages, this is probably faster (you pay the cost to fork, but then read the data in saner-sized chunks). This idea is shamelessly stolen from Junio. Signed-off-by: Jeff King <peff@xxxxxxxx> --- I confirmed the test fixes things on the OS X box I have access to. The "probably faster" above is of course hand-waving. On my system starting "cat" takes only about 40 syscalls, so that would naively imply it's a win in for all but the shortest messages. But of course "fork()" is a much more expensive syscall than "read()". git-filter-branch.sh | 10 +++++++++- t/t7003-filter-branch.sh | 10 ++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index e6e99f5..5b3f63d 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -346,7 +346,15 @@ while read commit parents; do die "parent filter failed: $filter_parent" fi - sed -e '1,/^$/d' <../commit | \ + { + while read -r header_line && test -n "$header_line" + do + # skip header lines... + :; + done + # and output the actual commit message + cat + } <../commit | eval "$filter_msg" > ../message || die "msg filter failed: $filter_msg" workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 66643e4..855afda 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -394,4 +394,14 @@ test_expect_success 'replace submodule revision' ' test $orig_head != `git show-ref --hash --head HEAD` ' +test_expect_success 'filter commit message without trailing newline' ' + git reset --hard original && + commit=$(printf "no newline" | git commit-tree HEAD^{tree}) && + git update-ref refs/heads/no-newline $commit && + git filter-branch -f refs/heads/no-newline && + echo $commit >expect && + git rev-parse refs/heads/no-newline >actual && + test_cmp expect actual +' + test_done -- 2.4.0.rc3.477.gc25258d -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html