Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 29, 2008 at 01:54:47PM +0100, Robin Rosenberg wrote:

> > There were several given in the "OS X normalize your UTF-8 filenames"
> > thread a while back. They generally boil down to "a<UMLAUT MODIFIER>"
> > versus "<A WITH UMLAUT>" both of which are valid UTF-8.
> 
> That is what /OS X/ does with file names. It changes one unicode code point
> to a sequence of other "equivalent" code points. I'm pretty sure perl does
> not do that.

My point is that we don't _know_ what is happening in between the decode
and encode. Does that intermediate form have the information required to
convert back to the exact same bytes as the original form? I don't think
you've provided any evidence that it does or does not.

But here is some evidence that it does work:

$ cat test.pl
sub is_valid {
  my $orig = shift;
  my $test = $orig;
  utf8::decode($test);
  utf8::encode($test);
  return $orig eq $test ? "yes" : "no";
}
print "utf-8: ", is_valid("\xc3\xb6"), "\n";
print "latin-1: ", is_valid("\xc3"), "\n";
print "utf-8 w/ combining: ", is_valid("o\xcc\x88"), "\n";

$ perl test.pl
utf-8: yes
latin-1: no
utf-8 w/ combining: yes

But it still feels a little wrong to test by converting. There must be
some way to ask "is this valid utf-8" (there are several candidate
functions, but I don't think either of us quite knows the right way to
invoke them).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux