On Sun, 19 Apr 2020 at 04:48, Danh Doan <congdanhqx@xxxxxxxxx> wrote: > > On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Martin Ågren <martin.agren@xxxxxxxxx> writes: > > > > > This is equivalent as long as `line->len` is equal to > > > `strlen(line->buf)`, which it will be (should be) because it's a > > > strbuf. Ok. > > > > For the guarantee to hold true, line->buf[0..line->len] should not > > have any '\0' byte in it. Yes. I sort of assumed it shouldn't ("because strbuf"), but it's a good question. Especially considering this is about various encodings.. This assumption that "strlen is length so the conversion is a no-op" could potentially be broken both for input (line->len) and output (out_len). > > This helper has two callers, but in either case, it needs to be > > prepared to work on output of decode_[bq]_segment(). Is there code > > anywhere that guarntees that the decoding won't stuff '\0' in the > > line? > > the current code allows NUL character in utf-8 [bq]-encoded string > in this function (early return) and its caller, > and report an error later: > > error: a NUL byte in commit log message not allowed. > > meanwhile, if the email was sent in other encoding, the current code > discards everything after NUL in that line, > thus silently accept broken commit message. My knee-jerk reaction to Junio's question was along the same line: surely if we could have a NUL in there, the current `strlen()` would use it as an excuse to silently truncate the string, either before processing or afterwards. Thanks for looking into that more. > Attached is the faulty patch in ISO-8859-1, which was used to > demonstrate my words. > The current code will make a commit with only "Áb" in the body, > while the new code rightly claims we have a faulty email. Good find. Martin