Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info

Martin Ågren <martin.agren@xxxxxxxxx> · Sun, 19 Apr 2020 06:34:10 +0200

On Sun, 19 Apr 2020 at 04:48, Danh Doan <congdanhqx@xxxxxxxxx> wrote:
>
> On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > Martin Ågren <martin.agren@xxxxxxxxx> writes:
> >
> > > This is equivalent as long as `line->len` is equal to
> > > `strlen(line->buf)`, which it will be (should be) because it's a
> > > strbuf. Ok.
> >
> > For the guarantee to hold true, line->buf[0..line->len] should not
> > have any '\0' byte in it.

Yes. I sort of assumed it shouldn't ("because strbuf"), but it's a good
question. Especially considering this is about various encodings..

This assumption that "strlen is length so the conversion is a no-op"
could potentially be broken both for input (line->len) and output
(out_len).

> > This helper has two callers, but in either case, it needs to be
> > prepared to work on output of decode_[bq]_segment().  Is there code
> > anywhere that guarntees that the decoding won't stuff '\0' in the
> > line?
>
> the current code allows NUL character in utf-8 [bq]-encoded string
> in this function (early return) and its caller,
> and report an error later:
>
>         error: a NUL byte in commit log message not allowed.
>
> meanwhile, if the email was sent in other encoding, the current code
> discards everything after NUL in that line,
> thus silently accept broken commit message.

My knee-jerk reaction to Junio's question was along the same line:
surely if we could have a NUL in there, the current `strlen()` would use
it as an excuse to silently truncate the string, either before
processing or afterwards. Thanks for looking into that more.

> Attached is the faulty patch in ISO-8859-1, which was used to
> demonstrate my words.
> The current code will make a commit with only "Áb" in the body,
> while the new code rightly claims we have a faulty email.

Good find.

Martin