Re: loadHTML()

M5 <m5@xxxxxxxxxxxxxxxx> · Mon, 24 Dec 2007 19:52:37 -0700

OK, I already knew that making it valid doesn't change the result.  
But the question remains, how to parse the HTML as it arrives (which  
I have no control over anyway), besides doing a str_replace on <br>  
and inserting a token, which I later replace (which I shouldn't have  
to, right?)

...Rene

On 24-Dec-07, at 7:19 PM, Casey wrote:

Actually, never mind. It does not have to be valid to work.

On Dec 24, 2007, at 6:15 PM, Casey <heavyccasey@xxxxxxxxx> wrote:

That's because it's not proper XHTML: "<br>" should be "<br />".

On Dec 24, 2007, at 6:03 PM, M5 <m5@xxxxxxxxxxxxxxxx> wrote:

Just getting into DOMDocument()... I'm loading an HTML page and  
trying to extract certain bits of text. Just one problem: loadHTML 
() seems to ignore orphan tags like '<br>'. For example, in the  
following HTML:

<div class="text">Some text is here. <br> New line. <br> Another  
new line. </div>
<div class="text">Some text is here. <br> New line. <br> Another  
new line. </div>
<div class="text">Some text is here. <br> New line. <br> Another  
new line. </div>

If I run the above HTML through:

$nodes = $table->getElementsByTagName("*");

I only get three nodes that I can iterate through (<div>). What I  
want to do is split/explode the three lines within each div, but  
when I look at the nodeValue of each node, it only shows  
something like "Some text is here.  New line.  Another new line."

Any ideas?

...Rene

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php