I have a table like this: artist_id | artistname | artistname_alpha 1 | The Doors | 2 | The The | 3 | 100 Monkeys | 4 | 3âÂ�¢16 | That last artistname is not in ASCII/English... Dunno what your email client is showing you, but it's: the digit 3 capital A with umlauts US cents sign capital A with carat question mark capital A with carat US cents sign the digit 1 the digit 6 THAT ought to get through any email client/mta okay. :-) Now, my goal is to fill in artistname_alpha with things such as: Doors, The The, The one hundred monkeys 3âÂ�¢16 (???) I've written a nifty function for this: function alpha ($string){ //$string = utf8_decode($string); $string = preg_replace_callback('/(\\$[0-9\\.]+)/', create_function('$s', 'return Numbers_Words::toCurrency(str_replace("$", "", $s[1]));'), $string); $string = preg_replace_callback('/([0-9]+)/', create_function('$s', 'return Numbers_Words::toWords($s[1]);'), $string); if (stristr(substr($string, 0, 4), 'The ')) return (substr($string, 4) . ', ' . substr($string, 0, 4)); elseif (stristr(substr($string, 0, 3), 'An ')) return (substr($string, 3) . ', ' . substr($string, 0, 3)); elseif (stristr(substr($string, 0, 2), 'A ')) return (substr($string, 2) . ', ' . substr($string, 0, 2)); else return $string; } Now, the tricky part is that I don't really know what '3âÂ�¢16' is. It looks like it might be UTF-8, but utf8_decode() had no effect on it, which is why I've commented that out in the function. SO my function currently converts it to: 'threeâÂ�¢sixteen' That ain't right. So, does anybody who understands this i18n stuff want to clue me in the right direction?... Things you should know: I'm not trying to provide support for anything but English here, unless it's trivial to do so. The table has 150,000 rows. I have no real control over fancy MySQL settings, as it's a $20 shared host deal. Every day, at 6 am, I get a new file of this data, and run through with a script that does an UPDATE or INSERT. REPLACE is not suitable due to primary key field size of source data. Anyway, I haven't even checked if the function as-is will be too slow, but whatever I do to fix the i18n issue can't have too much overhead, as it will be called 150,000 times every morning at 6 am. If it helps, here is what my data-source dumps out when he encounters this band name: http://cdbaby.com/cd/316live Here is the band's web-site: http://316live.com/ And, here, possibly, is HTML source for what somebody copied/pasted into the FORM to fill in the band name: 3·16 So, possibly, this is not i18n at all, and just somebody really really really silly copying and pasting an HTML entity 'middot' from their website into a form input and expecting it to render... Would '·' output by a browser turn into 'âÂ�¢' ??? If so, what can I do about it? -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php