Re: DOMDocument::loadXML() failed when parsing comments inside a script tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adam,

Thanks for the update but I'm thinking that it would be much easier if the
DOM parser could just ignore the contents of the <script> tags when parsing
HTML content. This way we would not have to out JavaScript or force uses to
add JavaScript to a separate file.

What do you think?

__
Raymond Irving

On Sun, Jun 6, 2010 at 11:22 PM, Adam Richardson <simpleshot@xxxxxxxxx>wrote:

> On Sun, Jun 6, 2010 at 10:39 PM, Raymond Irving <xwisdom@xxxxxxxxx> wrote:
>
>> Hello,
>>
>> I'm experiencing another issue when attempting to use
>> DOMDocument::loadXML()
>> to load the following HTML code:
>>
>> <?php
>> $html = '
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
>> http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
>> <html>
>>    <body>
>>        <script type="text/javascript">
>>            <!--
>>            var i = 0, html = "<strong>Bold Text</strong>,Normal Text";
>>            document.write(html);
>>            i--; // this line causes the parser to fail
>>            alert(html);
>>            -->
>>        </script>
>>    </body>
>> </html>';
>> $dom = new DOMDocument();
>> $dom->loadXML($html);
>> echo $dom->saveHTML();
>> ?>
>>
>> The parser throws the following error when it encounters "i--" in inside
>> the
>> <script> tag:
>>
>> Warning: DOMDocument::loadXML() [domdocument.loadxml]: Comment not
>> terminated <!-- var i = 0, html = "<strong>Bold Text< in Entity
>>
>> If I remove the like "i--" it will load the HTML code just fine.
>>
>> Any ideas as to why this throws an error?
>>
>> __
>> Raymond
>>
>
>
> A comment declaration starts with "<!", and ends with ">", with any number
> of comments following the form --comment-- in between:
> http://htmlhelp.com/reference/wilbur/misc/comment.html
>
> You'll see at the bottom of the article that they advocate a simple rule in
> comments:
> An HTML comment begins with "<!--", ends with "-->" and does not contain "
> --" or ">" anywhere in the comment.
>
> The occurrence of "i--" breaks that rule.
>
> In your case, if you're maintaining the pages, you can place the javascript
> in a separate file or place the javascript in a CDATA section.  If you're
> parsing pages you don't maintain, you can rip out the javascript before
> performing DOM tasks and parse it separately as needed to avoid potential
> issues.
>
> Adam
>
> --
> Nephtali:  PHP web framework that functions beautifully
> http://nephtaliproject.com
>

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux