And the culprit is that mb_detect_order() wasn't set up to handle
ISO-8859-1. It was "ASCII, UTF-8". Changing it to "ASCII, UTF-8,
ISO-8859-1" makes everything work as expected.
Robert William Vesterman wrote:
OK, now the problem seems to be not that mb_convert_encoding is
encoding incorrectly, it's that mb_detect_encoding is detecting
incorrectly. It's claiming that the raw string as received from the
browser is UTF-8, where in reality it seems to be ISO-8859-1. Sample
code:
<html><head><title>Minnie</title></head><body><p>
<?php
function output ( $label, $x ) {
echo $label . ': ' . $x . ' ... ' . mb_detect_encoding ( $x ) .
'<br/>';
}
$x = $_REQUEST['Minnie'];
output ( "Raw", $x );
output ( "Convert from detected",
mb_convert_encoding ( $x, "UTF-8", mb_detect_encoding ( $x ) ) );
output ( "Convert from ISO",
mb_convert_encoding ( $x, "UTF-8", "ISO-8859-1" ) );
?>
</p></body></html>
Output for "Minnie=Mi%F1oso":
Raw: Mi?oso ... UTF-8
Convert from detected: Mioso ... ASCII
Convert from ISO: Miñoso ... UTF-8
Robert William Vesterman wrote:
A little additional info: The "ASCII to ASCII" case for
"Minnie=Mouse" is merely because the UTF-8 encoding for "Mouse" is
the same as the ASCII encoding for "Mouse", and mb_detect_encoding is
matching on ASCII before UTF-8. So that's not an issue.
But, the "UTF-8 to ASCII" case for "Minnie=Miñoso" is still
(seemingly) screwy.
Robert William Vesterman wrote:
I've run into a problem where mb_convert_encoding seems to be
converting to ASCII, even though I'm telling it to convert to
UTF-8. This is with PHP version 4.3.11.
I had been asking it to convert from "auto" to UTF-8, so at first I
thought maybe "auto" was not the right choice. So I called
"mb_detect_encoding" to see the format of what I was trying to
convert; it said it was already UTF-8 (before I did the conversion).
So then I thought maybe I got the "from" and "to" parameters
backwards (although I was confident I was following the
documentation), so I changed mb_convert_encoding to use "UTF-8" as
/both/ the from and to.
It still converts to ASCII.
I understand that, given that it's already UTF-8, I don't need to
convert it to UTF-8. But other things that I receive might /not/ be
UTF-8, so I am still concerned with this.
Sample code:
<html><head><title>Minnie</title></head><body><p>
<?php
$x = $_REQUEST['Minnie'];
echo $x . ' ... ' . mb_detect_encoding ( $x ) . '<br/>';
$x = mb_convert_encoding ( $x, "UTF-8", "UTF-8" );
echo $x . ' ... ' . mb_detect_encoding ( $x ) . '<br/>';
?>
</p></body></html>
Output, when called with URL parameter "Minnie=Miñoso":
Miñoso ... UTF-8
Mioso ... ASCII
Then I changed the "from" so that I could try converting from
something other than UTF-8:
$x = mb_convert_encoding ( $x, "UTF-8", mb_detect_encoding ( $x ) );
And now, output when called with "Minnie=Mouse":
Mouse ... ASCII
Mouse ... ASCII
Does anyone have any idea what's going on here? Am I doing something
wrong?
Thanks in advance for any help.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php