Torsten Bögershausen <tboegi@xxxxxx> writes: > On 09.03.16 21:26, Junio C Hamano wrote: >> Anders Kaseorg <andersk@xxxxxxx> writes: > [] >> sane_grep () { >> - GREP_OPTIONS= LC_ALL=C grep "$@" >> + GREP_OPTIONS= LC_ALL=C grep @@SANE_TEXT_GREP@@ "$@" >> } >> >> sane_egrep () { >> - GREP_OPTIONS= LC_ALL=C egrep "$@" >> + GREP_OPTIONS= LC_ALL=C egrep @@SANE_TEXT_GREP@@ "$@" >> } >> > > I always wondered why we do LC_ALL=C. It is because, at least back when we started scripted Porcelains of Git, that is the only thing we could count on available everywhere, and more importantly, that is how you disable all the i18n gotchas and ask tools to use the traditional "a file stores a stream of bytes" semantics. > Isn't that begging for trouble, when we feed UTF-8, ISO-8895-1 > or other stuff into a program and say LC_ALL=C at the same time ? Yes and no. It is true that in "C" ("POSIX"), behaviour on anything outside the portable and control character sets is undefined, so in theory, it is bad that we relied on that undefined behaviour to be to treat the input as just a stream of bytes. But at the same time, it is no worse than letting the user's locale take effect or use hardcoded en_US.UTF-8 and passing unknown end user data that could be in latin1 and other stuff. And sensible implementors of "C" locale seemed to choose the "stream of bytes" back when fa9d3485 (git-am: force egrep to use correct characters set, 2009-09-25) was written, which is where LC_ALL=C comes from. I wasn't around when that patch was accepted, so I cannot quite tell if it was done to fix a reported bug (i.e. it would reintroduce the bug if we lost LC_ALL=C) or it was done "just because". Back when e1622bfc (Protect scripted Porcelains from GREP_OPTIONS insanity, 2009-11-23) introduced sane_grep, the only caller of it that wanted to use LC_ALL=C was this callsite in "git am", and we no longer have that callsite since 783d7e86 (builtin-am: remove redirection to git-am.sh, 2015-08-04), so I think whatever fa9d3485 wanted to do will not be broken if we removed LC_ALL=C from sane_grep. It however is possible that any callsites of sane_grep added after e1622bfc implicitly depended on having LC_ALL=C in the implementation. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html