Re: [PATCH 2/2] send-email: rfc2047-quote subject lines with non-ascii characters

Robin Rosenberg <robin.rosenberg.lists@xxxxxxxxxx> · Sat, 29 Mar 2008 10:39:43 +0100

Den Saturday 29 March 2008 10.11.45 skrev Jeff King:
> On Sat, Mar 29, 2008 at 10:02:43AM +0100, Robin Rosenberg wrote:
> > My proof is entirely empirical. What happens is that attempting to decode
> > a non-UTF-8 string will put a unicode surrogate pair into the (now
> > Unicode) string and encoding will just encode the surrogate pair into
> > UTF-8 and not the original. As a result, the encode(decode($x)) eq $x
> > *only* if $x is a valid UTF-8 octet sequence. Why would you not get the
> > original back if you start with valid UTF-8?
>
> Because some UTF-8 sequences have multiple representations, and that

Care to give an example?

-- robon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html