Problems working with HTML using PHP's XML tools (placing mixed text/html into xpath-specified nodes...)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there a straightforward way (or, heck, any way) of placing mixed
html/text content into xpath-specified nodes using any of PHP's XML
tools?

So far, I've tried SimpleXML and the DOM and things aren't coming out well.

SimpleXML:

     /* $filename contains path to valid XML file, $xpathxpr contains
valid XPath expression matching at least  one document node, $fillval
contains a mixed well-formed text/xhtml string to be pre-pended within
each matching node */

    $sx = simplexml_load_file($filename);
    $nodes = $sx->xpath($xpathxpr);
    foreach($nodes as $node) {
          $children = $node->children();
          $children[0] = $fillval . $children[0];
    }

This only sortof works. I get $fillval appended before the original
contents of each matching docment node.... but if I've put any markup
in, it's all there as literal text (ie, <a
href="http://php.net";>php.net</a> wouldn't show up as a link, you'd
see the actual markup when the document is rendered).

A variation on this that I tried is creating a new SimpleXMLElement
object, with the mixed text/markup string as an argument passed to the
constructor, since the docs seem to indicate this is blessed. Weirdly,
when I do this, it seems to actually be stripping out the markup and
just giving the text. For example:

    $s = new SimpleXMLElement('<a href="#">Boo</a>')
    echo $s;

yields "Boo" (and echo $s->a yields nothing). This would be such a
huge bug I have a hard time believing it, so I have to suspect there's
a dance I'm not doing to make this work correctly.

DOM XML:

     /* again, $filename contains path to valid XML file, $xpathxpr
contains valid XPath expression matching at least  one document node,
$fillval contains a mixed well-formed text/xhtml string to be
pre-pended within each matching node */

    $domDoc = new DOMDocument();
    $domDoc->loadHTML(file_get_contents($filename));
    $search = new DOMXPath($domDoc);
    $nodes = $search->query($xpathxpr);
    foreach($nodes as $emt) {
        $f = $domDoc->createDocumentFragment();
        $f->appendXML($fillval . $emt->nodeValue);
        $emt->nodeValue = '';
        $emt->appendChild($f);
    }

This also gets mixed results. It gets cranky and issues warnings about
any HTML entities (despite that it seems it should be clear this is an
HTML document given the invocation of loadHTML), and while I'm seeing
some markup make it through, I'm not in other cases. I haven't quite
figured out the difference.

I can come up with some runnable tests if it will help, but I'm hoping
someone's already familiar with the general issues with using PHP's
XML tools to work with HTML that they can make some good commentary on
the matter.

Thanks,

Weston

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux