the article at http://devlog.info/2008/08/24/php-and-unicode-utf-8, among other web pages, suggests checking for valid utf-8 string encoding using (strlen($str) && !preg_match('/^.{1}/us', $str)). however, another article, http://www.phpwact.org/php/i18n/charsets, says this cannot be trusted. i work exclusively with mbstring environments so i could use mb_check_encoding(). which leads to the question of what to do if mb_check_encoding() indicates bad input? i don't want to throw the form back to the user because most of my users will not be able to rectify the input. errors in the data are undesirable, of course, but in my application, no disastrous. so i'm inclined to the approach mentioned here: http://blog.liip.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-chara cters.html, i.e. iconv("UTF-8","UTF-8//IGNORE",$t), which will quietly eliminate badly formed characters and move on (iconv will throw a notice on bad utf-8). so i'm considering using a function like this: function clean_input(&$a) { if ( is_array($a) && !empty($a) ) foreach ($a as $k => &$v) clean_input($v); elseif ( is_string($a) && !mb_check_encoding($a, 'UTF-8')) $a = iconv('UTF-8', 'UTF-8//IGNORE', $a); } and calling it on $_POST or $_GET as appropriate at the stop of any script that uses those superglobals. it seems a bit lazy to me but that's my nature and i think this might be good enough. any thoughts? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php