Re: [PATCH] CMIT_FMT_EMAIL: Q-encode Subject: and display-name part of From: fields.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

* Junio C Hamano [06-05-16 03:18:24 -0700] wrote:

[...]

Thoughts?

If we decide to do the header formatting here, there are two
further enhancements that need to be done:

(1) The charset must be configurable for projects that use
    encoding different from UTF-8, perhaps with the .git/config
    [i18n] commitEncoding.  It is only a convention, not a hard
    rule, to use UTF-8 for the metainformation.

To write an encoder really fully conforming to RfC2047 is a mess. Not so
much because the algorithms are difficult but because there're many
things to take care of if you want to do it right.

For example, encoded words are required to be at most something below 80
characters long. For names this maybe is not an issue, but for subjects.
I didn't really check whether your patch produces only the minimum
encoding (i.e. only those words that need it and not just all words with
'_' or '=20' in between them) but if not, 80 isn't that much after all.
And you may need to think about header folding (and unfolding for
reading it back in).

Also, supporting any character set (via iconv()) blows up the
implementation. There're character sets for which other RfCs define the
encoding method so only using quoted-printable is not fully correct in
all possible cases.

And, with the first point, several character sets really can become a
mess as you need to produce several encoded words because the input
would exceed RfC limits otherwise. Because for multi-byte character sets
you musn't break within a multi-byte character sequence but only at
their boundaries. So you need a generic way to detect the byte-size of
such a character in any supported character set.

With just the UTF-8 encoding all of this is pretty simple though.

I would rather try to find a way to implement this in a scripting
language that already has standard modules for this or makes it easy to
write one. In C this gets quite lengthy...

  bye, Rocco
--
:wq!
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]