Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Martin Ågren <martin.agren@xxxxxxxxx> writes:
> 
> > This is equivalent as long as `line->len` is equal to
> > `strlen(line->buf)`, which it will be (should be) because it's a
> > strbuf. Ok.
> 
> For the guarantee to hold true, line->buf[0..line->len] should not
> have any '\0' byte in it.  
> 
> This helper has two callers, but in either case, it needs to be
> prepared to work on output of decode_[bq]_segment().  Is there code
> anywhere that guarntees that the decoding won't stuff '\0' in the
> line?

- First caller: `decode_header`, we don't have such guarantee,
  The old code will discard everything after first NUL characters.
  I'm _not_ really familiar with email standard, does email allow
  UTF-16 (albeit in [bq]-encoded) in header? If yes, the current code
  is likely disallow it. The new code accidentally, accept
  [bq]-encoded-utf-16 header and reencode to utf-8 for commit.
  Yes, it's very likely only UTF-8, ISO-8859-1, some variant of
  ISO-2022 is used nowaday in email.
  *If* we would like to not exclude UTF-16, the new question is should
  we trust the length of newly converted utf-8 string.

- Second caller: `handle_commit_msg`: everything get interesting from
  here.

Get back to the question of trusting the length of newly converted
utf-8 string.

I _think_ we should,
because I _think_ there shouldn't be any discrimination against any
encoding (be it utf-8, ISO-8859-1, etc...),
the current code allows NUL character in utf-8 [bq]-encoded string
in this function (early return) and its caller,
and report an error later:

	error: a NUL byte in commit log message not allowed.

meanwhile, if the email was sent in other encoding, the current code 
discards everything after NUL in that line,
thus silently accept broken commit message.

Attached is the faulty patch in ISO-8859-1, which was used to
demonstrate my words.
The current code will make a commit with only "Áb" in the body,
while the new code rightly claims we have a faulty email.

-- 
Danh
From e27fb59f5a4854efc0badd902673fd636bcab52e Mon Sep 17 00:00:00 2001
From: =?ISO-8859-1?q?=C4=90o=C3=A0n=20Tr=E1=BA=A7n=20C=C3=B4ng=20Danh?=
 <congdanhqx@xxxxxxxxx>
Date: Sun, 19 Apr 2020 09:31:22 +0700
Subject: [PATCH] =?ISO-8859-1?q?=C4=CB=D1=CF=D6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

��fg

---
 afile | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 afile

diff --git a/afile b/afile
new file mode 100644
index 0000000000..e69de29bb2
-- 
2.26.1.301.g55bc3eb7cb


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux