Is there a meta tag that specifies the encoding? When loading HTML that
is also used to determine the encoding. I think I need to clarify the
encoding issue:
I'll bet when the document is loading, the encoding is being properly
detected. When working with the elements however you are getting hung up
on the UTF-8 factor....
you probably do something like the following:
$myelement = getElementById('someid');
print $myelement->textContent;
That right there will output the textual content in UTF-8 (the garbled
characters). It does not take into consideration the encoding used in
the origional document. This is just how the xml functions work. Now...
You really need to do something like:
$text = $myelement->textContent;
print iconv("UTF-8", <output encoding>, $text);
If the encoding is in the meta tag, typically encountered as:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
If you add the content to a dom node, you do not change the encoding
since the functions all work on UTF-8. The document to which the content
is being added however, must be set to use the desired encoding. I am
assuming you are doing what I previously explained though.
Rob
Leonidas Safran wrote:
Hello Rob,
Thanks for answering (so fast)... :-)
Remember most of the functionality - other than the saveXML(),
saveHTML() functions - output using UTF-8
(which you would need to convert to what ever encoding you need).
Well I did try before loadHTML call:
$doc = new DomDocument('1.0', 'iso-8859-1');
This does nothing. loadHTML() causes a new underlying document and
replaces the one you created with the new DOMDocument() call. That is
only pertinant when you are manually building a document.
Maybe it's a problem that the source webpage I'm loading has no charset declaration. It solely uses:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="de" xmlns="http://www.w3.org/1999/xhtml">
Don't know if that has an influence...
How are you getting that output?
About the output I make, I don't use the saveHTML function because I just cut some parts of the source (grabbed with getElementById() and other related functions) and only need them, so I just "echo" them into a new document.
LS
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php