Re: What's cooking in git.git (Jan 2017, #02; Sun, 15)

Jeff King <peff@xxxxxxxx> · Mon, 16 Jan 2017 11:04:56 -0500

On Mon, Jan 16, 2017 at 01:52:39PM +0100, Johannes Schindelin wrote:

> > >  An error message with an ASCII control character like '\r' in it
> > >  can alter the message to hide its early part, which is problematic
> > >  when a remote side gives such an error message that the local side
> > >  will relay with a "remote: " prefix.
> > >
> > >  Will merge to 'next'.
> > 
> > Please be not too hasty with advancing this topic to master. I could imagine
> > that there is some unwanted fallout on systems where the end-of-line marker
> > CRLF is common. Though, I haven't tested the topic myself, yet, nor do I
> > expect regressions in *my* typical workflow.
> 
> Thank you for being so attentive.
> 
> This topic branch would indeed have caused problems. Worse: it would have
> caused problems that are not covered by our test suite, as Git for
> Windows' own utilities do not generate CR/LF line endings. So this
> regression would have bit our users. Nasty.

Hmm.  I am not sure to what degree CRLFs are actually a problem here.
Keep in mind these are error messages generated via error(), and so not
processing arbitrary data. I can imagine that CRs might come from:

  1. A parameter like a filename or revision name. If I ask for a
     rev-parse of "foo\r\n", then it's probably useful to mention the
     "\r" in the error message, rather than ignoring it (or converting
     it to a line-feed).

     And I think that would apply to any input parameter we show via
     error(), etc, if it is connected to a newline (ideally we would
     show newlines as "?", too, but we cannot tell the difference
     between ones from parameters, and ones that are part of the error
     message).

  2. The printf-fmt strings themselves. But these come from C code,
     which just uses "\n". My impression is that it is fprintf() which
     is responsible for converting that to "\r\n". And we are doing our
     sanitizing here between an snprintf(), and an fprintf() of the
     result. So it should see only the raw "\n" fields.

I am certainly open to loosening the sanitizing for CR to make things
work seamlessly on Windows. But I am having trouble imagining a case
that is actually negatively impacted.

> -- snipsnap --
> Subject: [PATCH] fixup! vreport: sanitize ASCII control chars

Given the subtlety here, I'd much rather have a patch on top.

> The original patch is incorrect, as it turns carriage returns into
> question marks.
> 
> However, carriage returns should be left alone when preceding a line feed,
> and simply turned into line feeds otherwise.

The question of whether to leave CRLFs alone is addressed above. But I
do not understand why you'd want a lone CR to be converted to a line
feed. Running:

  git rev-parse --abbrev-ref "$(printf "foo\r")"

with my patch gets you:

  fatal: ambiguous argument 'foo?': unknown revision or path not in the working tree.

But with yours:

  fatal: ambiguous argument 'foo
  ': unknown revision or path not in the working tree.

Obviously the "?" thing is a lossy transformation, but I do not see how
a newline is any less confusing (but see below for some thoughts).

> To make the end result more readable, the logic is inverted so that the
> common case (no substitution) is handled first.
> 
> While at it, let's lose the unnecessary curly braces.

Please don't. Obviously C treats the "if/else" as a single unit, but
IMHO it's less error-prone to include the braces any time there are
multiple visual lines. E.g., something like:

  while (foo)
	if (bar)
		one();
	else
		two();
	three();

is much easier to spot as wrong when you would require braces either
way (and not relevant here, but I'd say that even an inner block with a
comment deserves braces for the same reason).

> We also add a regression test that verifies that carriage returns are
> handled properly. And as we expect CR/LF to be handled properly also on
> platforms other than Windows, this test case is not guarded by the MINGW
> prerequisite.

I am not sure what "properly" means here. In your test:

> +# This test verifies that the error reporting functions handle CR correctly.
> +# --abbrev-ref is simply used out of convenience, as it reports an error including
> +# a user-specified argument.
> +test_expect_success 'error messages treat CR/LF correctly' '
> +	test_must_fail git rev-parse --abbrev-ref "$(printf "CR/LF\\r\\n")" 2>err &&
> +	grep "^fatal: .*CR/LF$" err
> +'

The "\n" is eaten by the shell, and git sees only "CR/LF\r". So we are
not testing the CRLF case in vreportf() at all.

We do end up with "CR/LF\n" in vreportf(), which is presumably converted
by fprintf() to "CR/LF\r\n" on Windows. And so perhaps that is why you
are doing the "convert \r to \n" thing above.

But I still think it's not doing the right thing. Git _didn't_ see CRLF,
it saw CR.

-Peff