Re: Generate XHTML (HTML compatible) Code using DOMDocument

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I will say though this negates the reason I chose to use domdocument to begin with. I am feeding it snippets of HTML that usually do not validate and I am not sure I want to run it through tidy first to convert from HTML to XHTML to run the domdocument and then convert it back... I am essentially using this to traverse the DOM and process all a href and img src attributes for a link remapping job. (also realizing the power of php's DOM for other things I used to try tidy and then use simplexml when doing HTML scraping ...) but php's dom allows me to give it absolutely crappy HTML and it still works.

However if someone has a nice regular expression or chunk of code that allows you to scan a doc for a href and then replaces them in the proper context (not just globally) that would work too. I can't just blindly find urls and then replace them (although the reason for this escapes me right now)

On Apr 13, 2009, at 8:01 AM, Raymond Irving <xwisdom@xxxxxxxxx> wrote:



Michael,

You are absolutely right! It's loadHTML() that's causing the problems.


Best regards,
__
Raymond Irving


--- On Mon, 4/13/09, Michael A. Peters <mpeters@xxxxxxx> wrote:

From: Michael A. Peters <mpeters@xxxxxxx>
Subject: Re: Generate XHTML (HTML compatible) Code using DOMDocument
To: "Michael Shadle" <mike503@xxxxxxxxx>
Cc: "Raymond Irving" <xwisdom@xxxxxxxxx>, "php- general@xxxxxxxxxxxxx" <php-general@xxxxxxxxxxxxx>
Date: Monday, April 13, 2009, 5:36 AM
Michael A. Peters wrote:


function makeHTML($document) {
    $buffer = $document->saveHTML();
    $output =
html_entity_decode($buffer,ENT_QUOTES,"UTF-8");
    return $output;
    }

I'll try it and see what it does.


Huh - not tried above yet - but with

$test = $myxhtml->createElement('p','שלום');
$xmlBody->appendChild($test);

both saveXML() and saveHTML() do the right thing.

However if I have the string

<p>שלום</p>

and load it into a DOM -

With loadHTML() the utf8 is lost regardless of whether I
use saveXML() or saveHTML()

With loadXML() the utf8 is preserved regardless of whether
or not I use saveXML() or saveHTML()

php 5.2.9
libxml2 2.6.26-2.1.2.7 (CentOS 5.3)

I wonder if the real utf8 problem people experience is
really with loadHTML() and not with saveHTML() ??


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux