Pardon the messy code, but I got this working like a charm. Then I went to try it on some Russian content and it broke. The inbound was utf-8 encoded Russian characters, output was something else unintelligible. I found a PHP bug from years ago that sounded related but the user had a workaround. Note that it does not appear that any of the functions break the encoding - it is the ->saveHTML() that doesn't seem to work (I also tried saveXML() and it did not work either? I am totally up for changing out using php's DOM and using another library, basically I just want to traverse the DOM and pick out all <a href> and <img src> and possibly any other external references in the documents so I can run them through some link examination and such. I figured I may have to fall back to a regexp, but PHP's DOM was so good with even partial and malformed HTML, I was excited at how easy this was... $dom = new domDocument; @$dom->loadHTML($string); $dom->preserveWhiteSpace = false; $links = $dom->getElementsByTagName('a'); foreach($links as $tag) { $before = $tag->getAttribute('href'); $after = strip_chars($before); $after = map_url($after); $after = fix_link($after); if($after != false) { echo "\tBEFORE: $before\n"; echo "\tAFTER : $after\n\n"; $tag->removeAttribute('href'); $tag->setAttribute('href', $after); } } return $dom->saveHTML(); } I tried things like this: new DomDocument('1.0', 'UTF-8'); as well as encoding options for $dom like $dom->encoding = 'utf-8' or something (I tried so many variations I cannot remember anymore) Anyone have any ideas? As long as it can read in the string (which is and should always be UTF-8) and spit out UTF-8, I can make sure any of my functions are UTF-8 safe that handle the data... Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php