Re: mb_convert_encoding converting to ASCII instead of UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And the culprit is that mb_detect_order() wasn't set up to handle ISO-8859-1. It was "ASCII, UTF-8". Changing it to "ASCII, UTF-8, ISO-8859-1" makes everything work as expected.

Robert William Vesterman wrote:
OK, now the problem seems to be not that mb_convert_encoding is encoding incorrectly, it's that mb_detect_encoding is detecting incorrectly. It's claiming that the raw string as received from the browser is UTF-8, where in reality it seems to be ISO-8859-1. Sample code:

  <html><head><title>Minnie</title></head><body><p>
  <?php
  function output ( $label, $x ) {
echo $label . ': ' . $x . ' ... ' . mb_detect_encoding ( $x ) . '<br/>';
  }

  $x = $_REQUEST['Minnie'];
  output ( "Raw", $x );
  output ( "Convert from detected",
     mb_convert_encoding ( $x, "UTF-8", mb_detect_encoding ( $x ) ) );
  output ( "Convert from ISO",
     mb_convert_encoding ( $x, "UTF-8", "ISO-8859-1" ) );
  ?>
  </p></body></html>

Output for "Minnie=Mi%F1oso":

  Raw: Mi?oso ... UTF-8
  Convert from detected: Mioso ... ASCII
  Convert from ISO: Miñoso ... UTF-8

Robert William Vesterman wrote:
A little additional info: The "ASCII to ASCII" case for "Minnie=Mouse" is merely because the UTF-8 encoding for "Mouse" is the same as the ASCII encoding for "Mouse", and mb_detect_encoding is matching on ASCII before UTF-8. So that's not an issue.

But, the "UTF-8 to ASCII" case for "Minnie=Miñoso" is still (seemingly) screwy.

Robert William Vesterman wrote:
I've run into a problem where mb_convert_encoding seems to be converting to ASCII, even though I'm telling it to convert to UTF-8. This is with PHP version 4.3.11.

I had been asking it to convert from "auto" to UTF-8, so at first I thought maybe "auto" was not the right choice. So I called "mb_detect_encoding" to see the format of what I was trying to convert; it said it was already UTF-8 (before I did the conversion). So then I thought maybe I got the "from" and "to" parameters backwards (although I was confident I was following the documentation), so I changed mb_convert_encoding to use "UTF-8" as /both/ the from and to.

It still converts to ASCII.

I understand that, given that it's already UTF-8, I don't need to convert it to UTF-8. But other things that I receive might /not/ be UTF-8, so I am still concerned with this.

Sample code:

  <html><head><title>Minnie</title></head><body><p>
  <?php
  $x = $_REQUEST['Minnie'];
  echo $x . ' ... ' . mb_detect_encoding ( $x ) . '<br/>';
  $x = mb_convert_encoding ( $x, "UTF-8", "UTF-8" );
  echo $x . ' ... ' . mb_detect_encoding ( $x ) . '<br/>';
  ?>
  </p></body></html>

Output, when called with URL parameter "Minnie=Miñoso":

  Miñoso ... UTF-8
  Mioso ... ASCII

Then I changed the "from" so that I could try converting from something other than UTF-8:

  $x = mb_convert_encoding ( $x, "UTF-8", mb_detect_encoding ( $x ) );

And now, output when called with "Minnie=Mouse":

  Mouse ... ASCII
  Mouse ... ASCII

Does anyone have any idea what's going on here? Am I doing something wrong?

Thanks in advance for any help.








--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux