On Thu, Apr 5, 2018 at 8:53 AM, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Thu, Apr 05 2018, Stephon Harris wrote: > >> Fixes issue with seeing `sed: RE error: illegal byte sequence` when running git-completion.bash Please: - prefix the subject line with "completion:". - nit: replace "running" with "sourcing". - wrap the body of the commit message around 70 characters. - sign off your patch; see Documentation/SubmittingPatches and 'git commit -s'. >> --- >> contrib/completion/git-completion.bash | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash >> index b09c8a23626b4..52a4ab5e2165a 100644 >> --- a/contrib/completion/git-completion.bash >> +++ b/contrib/completion/git-completion.bash >> @@ -282,7 +282,7 @@ __gitcomp () >> >> # Clear the variables caching builtins' options when (re-)sourcing >> # the completion script. >> -unset $(set |sed -ne 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null >> +unset $(set |LANG=C sed -ne 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null I was wondering how could you see an error message from 'sed', when the stderr of the whole thing is redirected to /dev/null. It turns out that such a redirection doesn't affect the stderr of the commands executed inside the command substitution: $ echo $(echo foo ; echo >&2 error) 2>/dev/null error foo Interesting, I didn't know that. > This is getting closer to the issue than your previous patch, but > there's still some open questions: > > 1) What platform OS / version / sed version is this on? > > 2) What's the output from "set" that's causing this error? Do we have an > isolated test case for that? A quick internet search suggests that this error message tends to appear on Macs, when its 'sed' encounters an invalid UTF-8 character in its input. > 3) There's other invocations of "sed" in the file, aren't those affected > as well? The two 'sed' invocations in _git_stash() are suspect, as they process the output of 'git stash list', which includes the stashes' descriptions, which can contain whatever the users wished to store there (though they would probably get a "Warning: commit message did not conform to UTF-8" message from 9e83266525 (commit-tree: encourage UTF-8 commit messages., 2006-12-22) when doing so). The 'sed' invocation in __git_complete_revlist_file() is suspect as well, as it processes the output of 'git ls-tree'. Pathnames can contain anything, and while 'git ls-tree' quotes pathnames with "unusual" characters, I don't know whether it quotes all characters that can possibly upset Stephon's 'sed'. What about 'awk' on Stephon's platform? We already have one 'awk' invocation in __git_match_ctag(), which we use for 'git grep <TAB>' and 'git log -L:<TAB>'. That 'awk' script uses a regexp to match the symbol name in a ctags file; does any programming language allow such unusual characters in symbol names? What about 'sed' and 'awk' scripts that don't use regexps at all? (I intend to add such an 'awk' script in a day or two...) > 4) Any reason we wouldn't just set LC_AlL=C for the whole file? I see we > already do it for our invocation to "git merge". That would perhaps be the safest thing to do... But how can we set it for a whole file, when the file is not executed as a script but sourced into the user's shell environment?