Matthieu Moy <Matthieu.Moy@xxxxxxxxxxxxxxx> writes: > Johannes Schindelin <johannes.schindelin@xxxxxx> writes: > >> This is the correct thing to do, really: we already specify LF as >> field separator. > > I'm almost convinced that this is the right thing to do in the long run > ("almost" because I'm not sure, not because I have arguments against). I > agree with Junio that the commit message should be more convincing, but > indeed, accepting LF and not CR is strange. If there were a single character that denotes CRLF, I'd say that including such a character in IFS would make sense on a system with CRLF EOL convention. But that is not the case. On a platform with LF EOL convention, having LF in IFS makes sense, due to two reasons. * read is not the only user of IFS. Expressing "list of things" (pre bashism "shell array" days) by concatenating elements into a single string variable, separated with LF, and later iterating over them is a very common use case, e.g. LF=' ' list="$thing1" list="$list$LF$thing2" list="$list$LF$thing3" ... IFS=$LF for thing in $list do ... And including LF by default in IFS, especially when "things" can contain SP/HT, is handy. * It does not hurt on a platform with LF EOL convention to include LF in IFS, because you cannot have a "lone LF" in the middle of a line. Presence of a single LF alone will terminate the current line and the bytes that follow it will be a next line. No similarity argument can be made for a lone CR on a platform that uses CRLF EOL convention. A "lone CR" can appear in the middle of a line without terminating the current line, as only a CR that is immediately followed by a LF is the end of line, so you cannot make the "It does not hurt" argument for including CR to IFS. If you have a variable in which "A^MB" is there, "set $variable" would split them into two and assign B to $2, which is not what the scripts would expect. > However, is this the right thing to do in the maintainance branch? It > does fix the issue, but does so in a rather intrusive way, so I'd need > more arguments to be convinced that this is safe to merge in maint. If this were the "right" thing in general for shell scripts on systems with CRLF EOL convention, the implementation of the shell language on such a platform would be doing that upon startup (or upon "unset IFS"), and we wouldn't be having this discussion. Don't you think it is strange that individual applications like Git have to set IFS to include CR? I see it as a strong sign that this is not a correct thing to do. Intrusive does not begin to describe it. I think the "right" thing for a shell implementation on a CRLF platform to do is twofold. * Read http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html and notice that the operation is defined as a two-step process. It is supposed to read an input line and then "The terminating <newline> (if any) shall be removed from the input". And then "the result shall be split into fields" (at IFS boundaries). As POSIX does not say <newline> has to be a single octet with value 10 (the same way you are allowed to show CRLF when you call printf("\n")), a shell implementation would read CRLF terminated lines, and remove the terminating CRLF before doing the field splitting using IFS. This is why I said in my message to Dscho that IFS does not get into the picture. * Implement the "field splitting" (at IFS boundaries) with a small twist. When the script/user included '\n' in IFS, split at LF but remove CR if there is one that immediately precedes the LF. The latter would help if you did the previous "read is not the only user of IFS" example like this instead: LF=$(printf '\n.') LF=${LF%.} list="$thing1" list="$list$LF$thing2" list="$list$LF$thing3" ... IFS=$LF for thing in $list do ... and the platform 'printf' gave CRLF for "\n" (newline), resulting the list to be concatenated with CRLF, not LF. And this is why I asked Dscho if the shell is done correctly _for_ the platform. > Sorry for being negative, and especially sorry since I'm partly guilty > for the breakage. I just want to be sure that we don't break anything > while repairing it (we already introduced this breakage while repairing > another one...). Something along the line of the following would be tolerable, even though I think in the longer term, not just in Git land but in the larger ecosystem to use POSIXy tools on Windows, the best solution is to fix the shell so that it matches the expectation of the users of its platform. I say "something along the line of" here because I do not know how the problematic shell behaves on the assignment command that stuffs a lone CR into a variable. It _might_ need a similar protection against the shell feature "the last EOL is removed from the result of command expansion", as I did in the above example, depending on how incoherent the implementation is. The implementation seems to accept CRLF and LF in shell scripts proper just fine, but apparently its implementation of "read" does not honor such platform EOL convention, which caused this issue, and I don't know what it does in the codepath that implements command expansion. git-rebase--interactive.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index c42ba34..8f13a35 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -881,6 +881,7 @@ check_commit_sha () { check_bad_cmd_and_sha () { retval=0 lineno=0 + cr=$(printf "%c" 13) while read -r command rest do lineno=$(( $lineno + 1 )) @@ -888,6 +889,10 @@ check_bad_cmd_and_sha () { "$comment_char"*|''|noop|x|exec) # Doesn't expect a SHA-1 ;; + "$cr") + # Windows port of shell not stripping the newline + # at the end of an empty line correctly. + ;; pick|p|drop|d|reword|r|edit|e|squash|s|fixup|f) if ! check_commit_sha "${rest%%[ ]*}" "$lineno" "$1" then -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html