utf8_decode() and mixed character sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey everyone.  I'd been troubled for a while by the fact that inserting
cut-pasted special characters such as ä caused truncation when passed to
MySQL, then discovered that it was because I was cutting and pasting unicode
values into non-unicode Latin-1 strings.

Since Latin-1 also has equivalent values, I was hoping that filtering my mixed
unicode/non-unicode string through utf8_decode() would solve the problem, but
instead, where the unicode character used to be, I now get a '?', followed by a
few characters being taken out of the middle.  I'm guessing that this is because
utf8_decode() assumes the whole string is unicode and therefore removes a bunch
of extra bytes from the string and corrupts it.  At least, that's my guess.  I
could be very wrong (I have pretty much no experience with different character
sets...)

My question is, what's a good way to translate unicode characters in a
non-unicode string to their Latin-1 equivalents?  I need to be able to do this
in order to sanitize a fairly common form of input.

Thanks!

James

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux