Fun with MySQL collation, HTML charset and PHP utf8_encode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



How do I avoid having so much fun using utf8_encode throughout my document?

I was thinking of using output buffering and then making 1 call to
utf8_encode, but I think a better question is, how do I stop using
utf8_encode completely?

The docs say that utf8_encode "Encodes an ISO-8859-1 string to UTF-8".
 So if I start with a UTF-8 string, why should I need to use
utf8_encode?  Am I really starting with a UTF-8 string if I set MySQL
to utf_unicode_ci for that field, set the content type with
header('Content-type: text/html; charset=utf-8'); and set the HTML
charset with <meta http-equiv="content-type" content="text/html;
charset=utf-8"> ?

In MySQL 5.0.22, I had a Type text, Collation latin1_swedish_ci field
(default settings, I believe) which I pasted the character "é" from
the French Keyboard Viewer on a Mac Leopard machine into phpMyAdmin
2.11.1.2.  This is an e with an accent on top (in case it is not
rendered properly in your email client).
Hmm, pulling the phpMyAdmin version reveals:
MySQL charset:  UTF-8 Unicode (utf8)
MySQL connection collation: utf_unicode_ci

I retrieve the field using mysql_fetch_assoc and display it in an HTML
page rendered by PHP with and without
header('Content-type: text/html; charset=utf-8');
and
<meta http-equiv="content-type" content="text/html; charset=utf-8">

The document was originally saved in Dreamweaver 8 as a Unicode
Normalization Form: C (Canonical Decompositon, followed by Canonical
Composition) without "Include Unicode Signature (BOM)" -- great more
encoding to worry about in my editor.

The rendered view I see in Firefox 2.0.0.12 is a question mark "?"
where the French character should have appeared.  If I use
utf8_encode, the character appears as it should.

I had changed the MySQL Collation to utf8_general_ci and
utf8_unicode_ci and I still have to use utf8_encode to see the
character appear properly.

Luckily I'm on PHP 4.3.10, so I can't see what mb_check_encoding would
report -- if that would even help normally.

Don't you just love Monday fun?

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux