Den Saturday 29 March 2008 10.11.45 skrev Jeff King: > On Sat, Mar 29, 2008 at 10:02:43AM +0100, Robin Rosenberg wrote: > > My proof is entirely empirical. What happens is that attempting to decode > > a non-UTF-8 string will put a unicode surrogate pair into the (now > > Unicode) string and encoding will just encode the surrogate pair into > > UTF-8 and not the original. As a result, the encode(decode($x)) eq $x > > *only* if $x is a valid UTF-8 octet sequence. Why would you not get the > > original back if you start with valid UTF-8? > > Because some UTF-8 sequences have multiple representations, and that Care to give an example? -- robon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html