Hi Richard, Richard MICHAEL wrote: >>Richard MICHAEL wrote: >>> I am filtering our repo with git-filter-branch, but as the sed >>> script runs with LANG=C LC_ALL=C (7 bit US ASCII), it dies on >>> commits authored by our team members with accented names. [...] > What about special casing the bad sed (or whitelisting good sed)? > Surely a hack, but would those of us with GNU or BSD would be happy. > Which was the troublesome sed? Sorry for the slow response. The problematic sed is GNU sed from MacPorts (I think). Even with LC_ALL=C, .* no longer matches arbitrary sequences of bytes with such sed: you can check yours with $ echo 'étale' | LC_ALL=C sed 's/.*//' Unfortunately I have not been able to reproduce it on Linux. Debian sed 4.2.1-7 and GNU sed v4.2.1-21-gc6d32f0 both produce the expected result: $ echo 'étale' | LC_ALL=C sed 's/.*//' $ > Unfortunately, it > doesn't "die" well either; the 'export' shell var fails but it keeps > processing commits. Hmm, that sounds like a bug indeed. Here is what the start to a fix might look like, but I stopped early because it there's quite a lot of sed usage in git that expects to be able to process arbitrary data with short, newline-terminated lines (regardless of encoding). diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 962a93b..34a5fa3 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -68,8 +68,8 @@ eval "$functions" # "author" or "committer set_ident () { - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" + lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" && + uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" && pick_id_script=' /^'$lid' /{ s/'\''/'\''\\'\'\''/g @@ -90,9 +90,9 @@ set_ident () { q } - ' + ' && - LANG=C LC_ALL=C sed -ne "$pick_id_script" + LANG=C LC_ALL=C sed -ne "$pick_id_script" && # Ensure non-empty id name. echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac" } @@ -322,9 +322,11 @@ while read commit parents; do git cat-file commit "$commit" >../commit || die "Cannot read commit $commit" - eval "$(set_ident AUTHOR <../commit)" || + set_author=$(set_ident AUTHOR <../commit) && + eval "$set_author" || die "setting author failed for commit $commit" - eval "$(set_ident COMMITTER <../commit)" || + set_committer=$(set_ident COMMITTER <../commit) && + eval "$set_committer" || die "setting committer failed for commit $commit" eval "$filter_env" < /dev/null || die "env filter failed: $filter_env" -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html