bouncing back to the list so that others may benefit from our work... On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson <tjg@xxxxxxxxxxxx> wrote: > Nathan, > > Thanks for the suggestion, but it's still not working for me. Here's my > code: > > =========== > $HTML = new DOMDocument(); > @$HTML->loadHTML($text); > $Elements = $HTML->getElementsByTagName("*"); > > for ($X = 0; $X < $Elements->length; $X++) { > $Element = $Elements->item($X); > > if ($Element->tagName == "a") { > # SNIP - Do something with A tags here > } else if ($Element instanceof DOMText) { > echo $Element->nodeValue; exit; > } > } > =========== > > This loop never executes the instanceof part of the code. If I add: > > } else if ($Element instanceof DOMNode) { > echo "foo!"; exit; > } > > Then it echos "foo!" as expected. It just seems that none of the nodes in > the tree are DOMText nodes. In fact, get_class($Element) returns > "DOMElement" for every node in the tree. Tim, i got your code working with minimal effort by pulling in two of the methods i posted and making some revisions. scope it out, (this will produce the same output as my last post (the part after OUT:)) <?php $text = '<html><body>Test<br><h2>quickshiftin@xxxxxxxxx<a name="bar">stuff inside the link</a>Foo</h2><p>care</p><p>yoyser</p></body></html>'; $HTML = new DOMDocument(); $HTML->loadHTML($text); $Elements = $HTML->getElementsByTagName("*"); for ($X = 0; $X < $Elements->length; $X++) { $Element = $Elements->item($X); if($Element->hasChildNodes()) foreach($Element->childNodes as $curChild) if ($curChild->nodeName == "a") { # SNIP - Do something with A tags here } else if ($curChild instanceof DOMText) { convertToLinkIfNecc($Element, $curChild); } } echo $HTML->saveXML() . PHP_EOL; function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if( (strtolower($textContainer->nodeName) != 'a') && (filter_var($textNode->nodeValue, FILTER_VALIDATE_EMAIL) !== false) ) { convertMailtoToAnchor($textContainer, $textNode); } } function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode) { $newNode = new DomElement('a', $textNode->nodeValue); $textContainer->replaceChild($newNode, $textNode); $newNode->setAttribute('href', "mailto:{$textNode->nodeValue}"); } ?> so, the problem is iterating over a tree structure will only show you whats at the first level of the tree. this is why you need to call hasChildNodes(), and if that is true, call childNodes() and iterate across that (and really, the code should be doing the same thing there as well, calling hasChildNodes() and iterating over the results of childNodes()). the code i have shown will work for the html i posted, however it wont work on (x)html where these text nodes we're searching for are deeper in the tree than the second level. im sure you can cook up something that will recurse down to the leafs :) anyway, im going to try and hook up a RecursiveDOMDocumentIterator that implements RecursiveIterator so that it has the convenient foreach support. also, ill probly try to hook up a Filter varient of this class so that situations like this are trivial. stay tuned :D -nathan