Re: mb_convert_encoding converting to ASCII instead of UTF-8

Robert William Vesterman <bob@xxxxxxxxxxxxxxx> · Wed, 23 Apr 2008 13:32:25 -0400

I wasn't saying I was /telling/ it to go from UTF-8 to ASCII.  I was 
saying it /was/ going from UTF-8 to ASCII, despite the fact that I was 
telling it to go from UTF-8 to UTF-8.

And as noted previously in this thread, it turned out to be because 
mb_detect_encoding was /mistakenly/ detecting it as UTF-8 in the first 
place.  It was actually ISO-8859-1, not UTF-8.  So when I told it to 
convert from UTF-8 (which mb_detect_encoding said it was), 
mb_convert_encoding ran into a non-UTF-8 character (the ñ), and so threw 
it away.  The generated output was therefore all straight ASCII 
characters, which mb_detect_encoding therefore said was ASCII.

tedd wrote:
At 11:28 AM -0400 4/23/08, Robert William Vesterman wrote:
A little additional info: The "ASCII to ASCII" case for 
"Minnie=Mouse" is merely because the UTF-8 encoding for "Mouse" is 
the same as the ASCII encoding for "Mouse", and mb_detect_encoding is 
matching on ASCII before UTF-8.  So that's not an issue.

But, the "UTF-8 to ASCII" case for "Minnie=Miñoso" is still 
(seemingly) screwy.

Going for "UTF-8 to ASCII" is not going to work. The ASCII to UTF-8 
works because ASCII is contained within UTF8. But the reverse is not 
true. Not all of UTF-8 is contained within ASCII.

For example, the character (code-point) ñ does not appear in ASCII, so 
that doesn't work.

Cheers,

tedd

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php