I will say though this negates the reason I chose to use domdocument
to begin with. I am feeding it snippets of HTML that usually do not
validate and I am not sure I want to run it through tidy first to
convert from HTML to XHTML to run the domdocument and then convert it
back... I am essentially using this to traverse the DOM and process
all a href and img src attributes for a link remapping job. (also
realizing the power of php's DOM for other things I used to try tidy
and then use simplexml when doing HTML scraping ...) but php's dom
allows me to give it absolutely crappy HTML and it still works.
However if someone has a nice regular expression or chunk of code that
allows you to scan a doc for a href and then replaces them in the
proper context (not just globally) that would work too. I can't just
blindly find urls and then replace them (although the reason for this
escapes me right now)
On Apr 13, 2009, at 8:01 AM, Raymond Irving <xwisdom@xxxxxxxxx> wrote:
Michael,
You are absolutely right! It's loadHTML() that's causing the problems.
Best regards,
__
Raymond Irving
--- On Mon, 4/13/09, Michael A. Peters <mpeters@xxxxxxx> wrote:
From: Michael A. Peters <mpeters@xxxxxxx>
Subject: Re: Generate XHTML (HTML compatible) Code using
DOMDocument
To: "Michael Shadle" <mike503@xxxxxxxxx>
Cc: "Raymond Irving" <xwisdom@xxxxxxxxx>, "php-
general@xxxxxxxxxxxxx" <php-general@xxxxxxxxxxxxx>
Date: Monday, April 13, 2009, 5:36 AM
Michael A. Peters wrote:
function makeHTML($document) {
$buffer = $document->saveHTML();
$output =
html_entity_decode($buffer,ENT_QUOTES,"UTF-8");
return $output;
}
I'll try it and see what it does.
Huh - not tried above yet - but with
$test = $myxhtml->createElement('p','שלום');
$xmlBody->appendChild($test);
both saveXML() and saveHTML() do the right thing.
However if I have the string
<p>שלום</p>
and load it into a DOM -
With loadHTML() the utf8 is lost regardless of whether I
use saveXML() or saveHTML()
With loadXML() the utf8 is preserved regardless of whether
or not I use saveXML() or saveHTML()
php 5.2.9
libxml2 2.6.26-2.1.2.7 (CentOS 5.3)
I wonder if the real utf8 problem people experience is
really with loadHTML() and not with saveHTML() ??
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php