Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 30 Mar 2008 16:47:16 -0700

Robin Rosenberg <robin.rosenberg.lists@xxxxxxxxxx> writes:

> Den Saturday 29 March 2008 08.22.03 skrev Jeff King:
>> On Sat, Mar 29, 2008 at 08:19:07AM +0100, Robin Rosenberg wrote:
>> > Den Friday 28 March 2008 22.29.01 skrev Jeff King:
>> > > We always use 'utf-8' as the encoding, since we currently
>> > > have no way of getting the information from the user.
>> >
>> > Don't set encoding to UTF-8 unless it actually looks like UTF-8.
>>
>> OK. Do you have an example function that guesses with high probability
>> whether a string is utf-8? If there are non-ascii characters but we
>> _don't_ guess utf-8, what should we do?
>
> Any test for valid UTF-8 will do that with a very high probability. The
> perl UTF-8 "api" is a mess. I couldn't find such a routine!?. Calling 
> decode/encode and see if you get the original string works, but that is too
> clumsy, IMHO.

The sequence to decode followed by encode will test if you have a valid
one and if it is canonically encoded, which is testing too much.  You only
want to check if it is valid, and do not care about normalization.

I see this in perluniintro.pod:

    =item *

    How Do I Detect Data That's Not Valid In a Particular Encoding?

    Use the C<Encode> package to try converting it.
    For example,

        use Encode 'decode_utf8';
        if (decode_utf8($string_of_bytes_that_I_think_is_utf8)) {
            # valid
        } else {
            # invalid
        }

For commit log messages, we traditionally use similar idea to guess by
checking if it looks like an UTF-8 encoded string and otherwise assume
Latin-1 (and I think we still do if the user does not tell us).

If this issue is only about the --compose part of send-email, perhaps you
can interactively ask instead of "otherwise assume Latin-1"?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html