mike wrote: > On Tue, Feb 17, 2009 at 4:26 PM, mike <mike503@xxxxxxxxx> wrote: >> i tried that kind of stuff - it did not seem to work. >> >> i will try again... if anyone has any ideas i.e. "use iconv to convert >> to A, then use DOM stuff, then use iconv to move it back to UTF8..." >> etc. i am all ears. > > Nope - for example this is the input text (apologies if your reader > isn't utf-8) - simplified chinese > > 足以概括英特尔为此所付出的努力。谈及移动设备,英特尔公司自诩在该领域的创新犹如其户友好性设计及能效等一样出类拔萃。同时,英特尔也一直表示要帮助构建能够 > > Output is this: > > 一句“英特尔热衷于移åŠ&u > > What is funny is I don't care about altering the actual content, only > the content of the "href" and "src" attributes, which are all standard > latin-based URLs, too. > > Here's the simplest code to create the behavior > > $q = db_query("SELECT id,old FROM testing", "redirects"); > while(list($id, $doc) = db_rows($q)) { > $new = fix_document($doc); > $new = db_escape($new); > db_query("UPDATE testing SET new='$new' WHERE id=$id", > "redirects"); > } > db_free($q); > > function fix_document($string) { > $dom = new DomDocument('1.0', 'UTF-8'); > @$dom->loadHTML($string); > $dom->preserveWhiteSpace = false; > return $dom->saveHTML(); > } > > (Note: it is not the db functions, if I do this: > > function fix_document($string) { > return $string; > } > > The content is unaltered. > > Anyone with any ideas? Any options to feed to the DOM stuff? It's > translating the stuff to htmlentities, which I don't want either. > As i understand all non ASCII characters will be converted to html entities. Try this function fix_document($string) { $dom = new DomDocument('1.0', 'UTF-8'); @$dom->loadHTML($string); $dom->preserveWhiteSpace = false; return html_entity_decode($dom->saveHTML(),ENT_QUOTES,"UTF-8"); } header("Content-Type: text/html; charset=UTF-8"); echo fix_document('data here'); -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php