Re: loadHTML/loadHTMLFile - DOM functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Rob,

Thank you for your advices... :-)

> Is there a meta tag that specifies the encoding? 
> When loading HTML that is also used to determine the encoding. 
> I think I need to clarify the encoding issue:
> I'll bet when the document is loading, the encoding is being properly
> detected. When working with the elements however you are getting 
> hung up on the UTF-8 factor....

> you probably do something like the following:

> $myelement = getElementById('someid');
> print $myelement->textContent;

> That right there will output the textual content in UTF-8 
> (the garbled characters). It does not take into consideration the
> encoding used in the origional document. This is just how the xml
> functions work. Now...


> You really need to do something like:

> $text = $myelement->textContent;
> print iconv("UTF-8", <output encoding>, $text);

> If the encoding is in the meta tag, typically encountered as:
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

> If you add the content to a dom node, you do not change the encoding
> since the functions all work on UTF-8. The document to which 
> the content is being added however, must be set to use the desired
> encoding. I am assuming you are doing what I previously 
> explained though.

I tried following:

I downloaded the ominous html page, coded it as UTF-8 (with text-editor option) and added a metag-tag declaring utf-8 encoding:
<meta http-equiv = 'content-type' content = 'text/html; charset=UTF-8'>

I checked that the special characters were written correctly then (actually I had to correct them). 

Then I used the function:
$doc = new DomDocument('1.0', 'UTF-8');

The result is still the same, special chars are displayed wrong. Different wrong than before :-) but still wrong... ("ä" is now "ä").

I tried analogy to do the same with "ISO-8859-1" but it's not getting better...

So, fazit, even converting the whole document in UTF-8 and adding UTF-8 charset declaration to it, doesn't help me handling special chars...

And what about the img-tags which are converted into what ever invisible chars (empty spaces looking at the source code)...?


Thank you very much for your help!


LS
-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux