Re: [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info

Martin Ågren <martin.agren@xxxxxxxxx> · Sun, 19 Apr 2020 14:29:06 +0200

On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@xxxxxxxxx> wrote:
>
> We're passing buffer from strbuf to reencode_string,
> which will call strlen(3) on that buffer,
> and discard the length of newly created buffer.
> Then, we compute the length of the return buffer to attach to strbuf.
>
> During this process, we introduce a discrimination between mail
> originally written in utf-8 and other encoding.
>
> * if the email was written in utf-8, we leave it as is. If there is
>   a NUL character in that line, we complains loudly:
>
>         error: a NUL byte in commit log message not allowed.
>
> * if the email was written in other encoding, we truncate the data as
>   the NUL character in that line, then we used the truncated line for
>   the metadata.
>
> We can do better by reusing all the available information,
> and call the underlying lower level function that will be called
> indirectly by reencode_string. By doing this, we will also postpone
> the NUL character processing to the commit step, which will
> complains about the faulty metadata.
>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@xxxxxxxxx>

This makes sense to me.

I think the subject could be adapted though? Now it's not about "reusing
info" anymore, it's about using *other*, *better* info. Maybe

  mailinfo.c: avoid strlen on strings that might contain NUL

? Could probably be improved further..

> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
>                            struct strbuf *line, const char *charset)
>  {
>         char *out;
> +       size_t out_len;
>
>         if (!mi->metainfo_charset || !charset || !*charset)
>                 return 0;
>
>         if (same_encoding(mi->metainfo_charset, charset))
>                 return 0;
> -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> +       out = reencode_string_len(line->buf, line->len,
> +                                 mi->metainfo_charset, charset, &out_len);
>         if (!out) {
>                 mi->input_error = -1;
>                 return error("cannot convert from %s to %s",
>                              charset, mi->metainfo_charset);
>         }
> -       strbuf_attach(line, out, strlen(out), strlen(out));
> +       strbuf_attach(line, out, out_len, out_len);
>         return 0;
>  }

Same diff as before, ok.

> +write_nul_patch() {
> +       space=' '
> +       qNUL=
> +       case "$1" in
> +               subject) qNUL='=00' ;;
> +       esac

Here it's case/esac...

> +       cat <<-EOF
> +       From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> +       From: A U Thor <author@xxxxxxxxxxx>
> +       Date: Sun, 19 Apr 2020 13:42:07 +0700
> +       Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> +       MIME-Version: 1.0
> +       Content-Type: text/plain; charset=ISO-8859-1
> +       Content-Transfer-Encoding: 8bit
> +
> +       EOF
> +       if test "$1" = body
> +       then
> +               printf "%s\0%s\n" abc def
> +       fi

Here it's if/fi. Looks a bit inconsistent.

I suppose you tried to have a case for "body" above but couldn't get it
to work? Somehow, it would seem more consistent to have a qSubject and
qBody and handle them the same way, but maybe that's not possible?
Maybe using q_to_nul to create qBody containing a NUL?

> +       cat <<-\EOF
> +       ---
> +       diff --git a/afile b/afile
> +       new file mode 100644
> +       index 0000000000..e69de29bb2
> +       --$space
> +       2.26.1
> +       EOF
> +}

I think this helper function should use &&-chaining.

> +
>  test_expect_success setup '
>         # Note the missing "+++" line:
>         cat >bad-patch.diff <<-\EOF &&
> @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
>         test_i18ncmp expected actual
>  '
>
> +test_expect_success "NUL in commit message's body" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch body >body.patch &&
> +       test_must_fail git am body.patch 2>err &&
> +       grep "a NUL byte in commit log message not allowed" err
> +'

Makes sense.

> +
> +test_expect_failure "NUL in commit message's header" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch subject >subject.patch &&
> +       test_must_fail git am subject.patch 2>err &&
> +       grep "a NUL byte in Subject is not allowed" err
> +'

Interesting. :-)

Martin