On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Martin Ågren <martin.agren@xxxxxxxxx> writes: > > > This is equivalent as long as `line->len` is equal to > > `strlen(line->buf)`, which it will be (should be) because it's a > > strbuf. Ok. > > For the guarantee to hold true, line->buf[0..line->len] should not > have any '\0' byte in it. > > This helper has two callers, but in either case, it needs to be > prepared to work on output of decode_[bq]_segment(). Is there code > anywhere that guarntees that the decoding won't stuff '\0' in the > line? - First caller: `decode_header`, we don't have such guarantee, The old code will discard everything after first NUL characters. I'm _not_ really familiar with email standard, does email allow UTF-16 (albeit in [bq]-encoded) in header? If yes, the current code is likely disallow it. The new code accidentally, accept [bq]-encoded-utf-16 header and reencode to utf-8 for commit. Yes, it's very likely only UTF-8, ISO-8859-1, some variant of ISO-2022 is used nowaday in email. *If* we would like to not exclude UTF-16, the new question is should we trust the length of newly converted utf-8 string. - Second caller: `handle_commit_msg`: everything get interesting from here. Get back to the question of trusting the length of newly converted utf-8 string. I _think_ we should, because I _think_ there shouldn't be any discrimination against any encoding (be it utf-8, ISO-8859-1, etc...), the current code allows NUL character in utf-8 [bq]-encoded string in this function (early return) and its caller, and report an error later: error: a NUL byte in commit log message not allowed. meanwhile, if the email was sent in other encoding, the current code discards everything after NUL in that line, thus silently accept broken commit message. Attached is the faulty patch in ISO-8859-1, which was used to demonstrate my words. The current code will make a commit with only "Áb" in the body, while the new code rightly claims we have a faulty email. -- Danh
From e27fb59f5a4854efc0badd902673fd636bcab52e Mon Sep 17 00:00:00 2001 From: =?ISO-8859-1?q?=C4=90o=C3=A0n=20Tr=E1=BA=A7n=20C=C3=B4ng=20Danh?= <congdanhqx@xxxxxxxxx> Date: Sun, 19 Apr 2020 09:31:22 +0700 Subject: [PATCH] =?ISO-8859-1?q?=C4=CB=D1=CF=D6?= MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit �