Re: loadHTML/loadHTMLFile - DOM functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there a meta tag that specifies the encoding? When loading HTML that is also used to determine the encoding. I think I need to clarify the encoding issue: I'll bet when the document is loading, the encoding is being properly detected. When working with the elements however you are getting hung up on the UTF-8 factor....

you probably do something like the following:

$myelement = getElementById('someid');
print $myelement->textContent;

That right there will output the textual content in UTF-8 (the garbled characters). It does not take into consideration the encoding used in the origional document. This is just how the xml functions work. Now...

You really need to do something like:

$text = $myelement->textContent;
print iconv("UTF-8", <output encoding>, $text);

If the encoding is in the meta tag, typically encountered as:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

If you add the content to a dom node, you do not change the encoding since the functions all work on UTF-8. The document to which the content is being added however, must be set to use the desired encoding. I am assuming you are doing what I previously explained though.

Rob


Leonidas Safran wrote:
Hello Rob,

Thanks for answering (so fast)... :-)

Remember most of the functionality - other than the saveXML(), saveHTML() functions - output using UTF-8 (which you would need to convert to what ever encoding you need).

Well I did try before loadHTML call:

$doc = new DomDocument('1.0', 'iso-8859-1');

This does nothing. loadHTML() causes a new underlying document and replaces the one you created with the new DOMDocument() call. That is only pertinant when you are manually building a document.


Maybe it's a problem that the source webpage I'm loading has no charset declaration. It solely uses:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html lang="de" xmlns="http://www.w3.org/1999/xhtml";>

Don't know if that has an influence...

How are you getting that output?

About the output I make, I don't use the saveHTML function because I just cut some parts of the source (grabbed with getElementById() and other related functions) and only need them, so I just "echo" them into a new document.


LS

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux