Re: [users@httpd] Russian Charset Problem

André Malo <nd@xxxxxxxxx> · Wed, 1 Jun 2005 13:43:41 +0200

* Arne Heizmann <Arne.Heizmann@xxxxxxx> wrote:

> > I can tell you the reasons for using koi8-r, euc-jp etc instead of utf-8 
> > for the httpd docs. The resulting documents are significant smaller.
> 
> ru:        15169 => 20713
> ru+gzip:    5454 =>  6160
> 
> ja:        14063 => 16595
> ja+gzip:    4833 =>  5237

Uhm, what do these numbers refer to?

> Especially considering that you are limiting yourself to a very small 
> set of characters. As a result, you have to put the ugly hacky "ru" and 
> "ja" on the pages rather than the proper "Ð Ñ?Ñ?Ñ?ÐºÐ¸Ð¹" and "æ?¥æ?¬èª?" which 
> users are more likely to recognise.

Nope, the iso-tokens are chosen as linktext on purpose. The native language
names are in the title (or should be there at least, depends on the translator,
however).

[note that my mail client here can't recognize utf-8 properly, I'm leaving it
as is ...]

> Yes, I know you can use numerical 
> entities in HTML to achieve this nonetheless, but the more you use 
> those, the less of a "benefit" your legacy encoding becomes.

As a matter of fact, numeric character references are rare within the httpd docs.

nd

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx