On Monday 20 November 2006 18:19, Richard Lynch wrote: > You are correct. They are not "real" UTF-8 nor UTF-16 characters. > Catch the data somewhere in PHP and use the functions from the User > Contributed code on http://php.net/str_replace to replace the MS Word > chars with their ASCII or HTML equivalents -- Both versions are in the > User Contributed notes, plus variations on this theme. > > You may have trouble with REAL UTF-8 and UTF-16 charsets, however, as > I suspect that MS Word smart quotes may "collide" with those charsets > (codepages?) in a way that makes one indistinguishable from the other. Actually, I couldn't get any string-replacement techniques to work. None of them seemed to properly catch the characters involved, either in this PHP app or in a Perl app I was working on personally at the same time by coincidence. However, I discovered that at least part of the problem is at the HTTP level. It seems like the data was being corrupted before it even got to the server. Although we already had the Content type charset set to UTF-8 in the HTTP header, the browser (IE, Firefox, and Konqueror) was still defaulting to Latin1/Western, and I believe then *sending* data as that. When we set a <meta> tag to also set the content type and charset, however, the browser (all of them) switched into UTF-8 and submitted the data, and then displayed the smart quotes correctly (that is, without funky accented characters). It only seemed to work if the browser was set to UTF-8 both to submit the data and to read it. The existing pages remained borked. For the time being it seems the meta tag is working, but I'm quite curious as to why the browser would listen to that and NOT to the HTTP header. It also still doesn't explain why the string-replace method is still not working, even when everything is set to UTF-8. If anyone has an idea in that regard, please share. :-) -- Larry Garfield AIM: LOLG42 larry@xxxxxxxxxxxxxxxx ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php