Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 29, 2008 at 01:54:10PM +0100, Robin Rosenberg wrote:

> I think you really should try the UTF-8 guess, since a file may well be UTF-8 
> even if the user locale is something else. Especially for XML files, UTF-8
> is common, but there are many more cases. Look into git-gui/po for more 
> examples. The probability of a UTF-8 test being wrong is just so unimaginable 
> low.

Thinking about this more, I think it is only half the solution. If
something is not valid utf-8, then we know it must be something else.
But if something is valid utf-8, is it necessarily utf-8? I think we are
going to have a much higher probability of guessing wrong there.

For example, consider the bytes { 0xc3, 0xb6 }. In utf-8, they are 'ö'.
But in iso8859-1, they also have meaning (paragraph symbol followed by
Ã). Now that is an unlikely combination to come up. And maybe for
Latin-1, having two non-ascii characters next to each other is unlikely.
But over all commonly used encodings, what is the probability in an
average text of that encoding that it contains valid UTF-8?
For example, I have no idea what patterns can be found in EUCJP.

> > PS Your 'require' is more simply written as 'use I18N::Langinfo
> > qw(langinfo CODESET)', or perhaps even simpler:
> 
> See the man page, from which I stole it. It suggests you wrap it all inside 
> eval {}, just in case your perl does not have langinfo.

Yes, that does make sense for a script (I just couldn't see it because
the entire toy example would be inside the eval).

> As for the is_utf8() i'm not sure what it does, but I can't make it work.

There is some magic with how Perl marks strings as "binary" versus
"utf-8" that I don't quite understand. And I think is_utf8 is really
about asking "is the utf-8 flag set".

I think this discussion would benefit greatly from somebody who has more
of a clue how perl i18n stuff works. Why don't you work up a patch that
makes sense for you, and then hopefully that will get some attention?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux