Re: [PATCH v2 2/2] sh-setup: explicitly mark CR as a field separator

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 27 Oct 2015 10:46:38 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> This assumes that the platform is either CR/LF or LF. That is incorrect.
> Windows does *not* dictate the line endings to be CR/LF. Certain
> applications do.
>
> The shell is an MSys2 shell, and MSys2 convention is to use LF. Hence the
> shell is done correctly for the platform.
>
> Yet on Windows (and in cross-platform projects also on Linux and MacOSX),
> you have to deal with the fact that text files produced outside the cozy
> little shell can have different line endings.

I know that.  That is why I said this is platform specific.  When
somebody on Ubuntu comes and says "XYZ does not work when I saved my
text file with CRLF", I can say "don't do that then".  I've been
assuming that you cannot say the same to your users, in order to be
nicer to Windows.  You say MSys2 convention is to use LF, but what
do you exactly mean by that?

I am hoping that you do not mean "we treat a file created by a DOS
editor as a text file with CR appended at the end of each line".
I'd instead expect that a file the user wrote ABC on the first line
will give you ABC in $word if the user did this:

	#!/bin/sh
        read word <file

and not "ABC^M".  That is, "we may produce only LF files, we consume
LF files just fine, but we also treat CRLF files as text as their
users intended."

> As I said, I am uninterested in arguing for arguing's sake.

I already explained why this particular thing does not have anything
to do with IFS but comes from the way your "read" behaves, which is
not in line with the way users on your platform expects.  Is that so
hard to understand?

An argument you make to support adding stuff to IFS _is_ an argument
for arguing's sake.

> What I really needed you to do indicate the way forward.

Yes, and I already did, didn't I?

I already said that mucking with IFS is a wrong thing to do, and
indicated that that is not the way forward.

I also already said that fixing your read is the only sensible thing
to do in the longer term iff you claim to support those who use
"certain applications" that leave CRLF line terminator.

 * http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html
   defines the operation as a two-step process.  It is supposed to read
   an input line and then "The terminating <newline> (if any) shall
   be removed from the input".  And then "the result shall be split into
   fields" (at IFS boundaries).

   As POSIX does not say <newline> has to be a single octet with
   value 10 (the same way you are allowed to show CRLF when you call
   printf("\n"), and you do not see CR left at the end when you use
   gets(3)).

   So on a platform that (optionally) accepts CRLF as a line
   terminator, "read" should read up to LF for a line, and if there
   is a CR immediately before that LF, remove that CR, too.  Now you
   got the result of "terminating <newline> (if any) shall be
   removed".  Then look at "the result" and split into fields using
   IFS.  That way, you still process LF terminated files correctly,
   and you would never show a single ^M for an empty line, which
   triggered this thread.

 * Optionally, you may want to implement the "field splitting" (at
   IFS boundaries) with a small twist.  When the script/user
   included LF in IFS, split at LF but remove CR if there is one
   that immediately precedes the LF.

And also I gave you a more localized (i.e. less likely to negatively
affect unrelated parts of the system) workaround that we could use,
until your "read" is fixed.

Mind you, I am not happy with that "$cr may appear at the end of the
last token on the line" workaround.  Other places in the code (both
inside and outside Git scripted porcelains) that calls "read" to
read from text files can suffer from the same issue, until it is
fixed.  Covering up the bug in "read" by throwing extra characters
to IFS, especially when read is not the only user of IFS, is not a
fix.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html