Re: loadHTML()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



M5 schreef:
> OK, I already knew that making it valid doesn't change the result. But
> the question remains, how to parse the HTML as it arrives (which I have
> no control over anyway), besides doing a str_replace on <br> and
> inserting a token, which I later replace (which I shouldn't have to,
> right?)

for creating valid xHTMl you can run the input through tidy (http://php.net/tidy)

does echo $table->saveHTML() show the BR tags?

also try:

<?php

foreach($table->getElementsByTagName("div") as $div) {
	var_dump($div->hasAttributes(), $div->hasChildNodes());
	echo htmlentities($div->C14N()); // undocumented, found in manual, no idea if it works
}

> 
> ...Rene
> 
> 
> On 24-Dec-07, at 7:19 PM, Casey wrote:
> 
>> Actually, never mind. It does not have to be valid to work.
>>
>>
>>
>> On Dec 24, 2007, at 6:15 PM, Casey <heavyccasey@xxxxxxxxx> wrote:
>>
>>> That's because it's not proper XHTML: "<br>" should be "<br />".
>>>
>>>
>>>
>>> On Dec 24, 2007, at 6:03 PM, M5 <m5@xxxxxxxxxxxxxxxx> wrote:
>>>
>>>> Just getting into DOMDocument()... I'm loading an HTML page and
>>>> trying to extract certain bits of text. Just one problem: loadHTML()
>>>> seems to ignore orphan tags like '<br>'. For example, in the
>>>> following HTML:
>>>>
>>>> <div class="text">Some text is here. <br> New line. <br> Another new
>>>> line. </div>
>>>> <div class="text">Some text is here. <br> New line. <br> Another new
>>>> line. </div>
>>>> <div class="text">Some text is here. <br> New line. <br> Another new
>>>> line. </div>
>>>>
>>>> If I run the above HTML through:
>>>>
>>>> $nodes = $table->getElementsByTagName("*");
>>>>
>>>> I only get three nodes that I can iterate through (<div>). What I
>>>> want to do is split/explode the three lines within each div, but
>>>> when I look at the nodeValue of each node, it only shows something
>>>> like "Some text is here.  New line.  Another new line."
>>>>
>>>> Any ideas?
>>>>
>>>> ...Rene
>>>>
>>>> -- 
>>>> PHP General Mailing List (http://www.php.net/)
>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>>
>>
>> -- 
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux