Re: Using SSI to include a UTF-8 encoded file causes a strange character to be sent to the browser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi André,

Firstly, thank you very much for your email - the speed with which you responded is much appreciated. 

I am using Notepad purely to simplify and focus on the problem at hand. The actual HTML files are created from a Web Publishing system that uses XML and XSL. The user populates the XML via an Applet and when they save the file it is automatically transformed using the XSL into HTML. These final pages exhibit the same problem I have described when using Notepad.

And yes, the .shtml file does include the Meta tag you describe!  

Regards
Christopher Biggs

----- Original Message -----
From: "André Warnier" <aw@xxxxxxxxxx>
To: users@xxxxxxxxxxxxxxxx
Sent: Wednesday, 7 October, 2009 09:55:33 GMT +00:00 GMT Britain, Ireland, Portugal
Subject: Re:  Using SSI to include a UTF-8 encoded file causes a strange character to be sent to the browser

Hi.

Chris Biggs wrote:
...
>     When these files are saved as "ANSI" (using Notepad) 
(or rather in this case, as UTF-8)

Tips :
1) *don't use Notepad to edit HTML pages*.  Use a real editor, properly 
aware of character sets and encodings, and which will highlight 
incorrect UTF-8 characters.
Notepad has a big problem when saving UTF-8 encoded files : it writes a 
"BOM" at the beginning of the file, which is not only totally 
unnecessary for UTF-8, but also confuses other programs.
A BOM is a sequence of 2 or 3 bytes, meant in some cases to indicate the 
"byte order" of the file that follows.
For UTF-8, there is only one valid byte order, so the BOM is not 
necessary and could/should be ignored.
However, when such a file with a BOM prefix is being included by some 
software in the middle of another file (as you do with SSI), it usually 
causes the kind of problem you are seeing : "bizarre" characters in the 
middle.
2) use a proper <meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8" /> in the <head> section of your html files.  That should 
tell the browser what the encoding of the page is.
3) But this is really only a substitute for the real standard-conformant 
way of indicating the encoding to the browser : the webserver should 
send, with each html page, a HTTP header like :
Content-type: text/html; charset=UTF-8
Unfortunately, MS's IE (all versions and sub-versions) have a long 
history of ignoring or misinterpreting this part of the HTTP RFC, and 
deciding themselves what content the document has.
This is *wrong*, but unfortunately also, in the real world IE is much 
used, so one has to learn to work around this.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux