Why UTF-8? (was: Re: Anyone else having problems with acroread?)

Alexander Volovics <awol@xxxxxxx> · Wed, 13 Nov 2002 11:06:51 +0100

On Sat, Nov 09, 2002 at 09:07:22PM -0500, Michael Fratoni wrote:

> I still haven't heard a good reason to be using UTF-8 in the first place.

You can find some reasonably good answers in Bruno Haible's
"The Unicode HOWTO". Here is an excerpt:

----------------------------------------
There are far more than 256 characters in the world - think of cyrillic,
hebrew, arabic, chinese, japanese, korean and thai -, and new characters
are being invented now and then. The problems that come up for users are:

    * It is impossible to store text with characters from different
      character sets in the same document. For example, I can cite russian
      papers in a German or French publication if I use TeX, xdvi and
      PostScript, but I cannot do it in plain text.
    * As long as every document has its own character set, and recognition
      of the character set is not automatic, manual user intervention is
      inevitable. For example, in order to view the homepage of the
      XTeamLinux distribution http://www.xteamlinux.com.cn/ I had to tell
      Netscape that the web page is coded in GB2312.
    * New symbols like the Euro are being invented. ISO has issued a new
      standard ISO-8859-15, which is mostly like ISO-8859-1 except that it
      removes some rarely used characters (the old currency sign) and
      replaced it with the Euro sign. If users adopt this standard, they
      have documents in different character sets on their disk, and they
      start having to think about it daily. But computers should make
      things simpler, not more complicated.

The solution of this problem is the adoption of a world-wide usable
character set.
-----------------------------------------

I would think that these reasons are forcefull enough to switch to
unicode uniformly. As a mathematician I have often wanted to use
equations in plain text email, and not have to use some form of pseudo
LaTeX or use attachments and also to be able to cite from German, French
and say Russion sources. Now, with unicode, it is possible because unicode
also contains mathematical symbols.

But not yet really practical:
Firstly not all applications used under Linux are unicode compatible.
Secondly Psyche contains no fonts with a large enough character range
(only the ugly fixed iso-10646-1 fonts under 'misc' have a reasonably
large range containing also some math symbols). And as far as I have been
able to find out there are no "free" quality scalable fonts with large
unicode ranges available (i.e. containing characters for more than one
language and math symbols, etc.).  
Some of the recent TT fonts in OfficeXP also have a version with unicode
extensions for some languages like Russian, Arabic, Hebrew, etc, but
as far as I know, no math symbols. Using these is probably the best
option for now (but then you need a license = $).
Thirdly we have no halfway friendly tools under Linux (I think OfficeXP
has a sort of virtual keyboards to enter other languages) to write
multicharacter documents. For example in Vim and Yudit you have to
enter unicode characters by typing their hex code. (Though Yudit does
have other options, but none of them 'easy').
Fourthly it is difficult to use TT fonts in say xterm due to their
'expansive' behaviour (I have not found a way to solve this yet).

Even given these obstacles I personally think Unicode is the way to go
and I applaud RH for introducing it.

Alexander

*The United States must fully disclose and destroy it's Weapons of
 Mass Destruction*

-- 
Psyche-list mailing list
Psyche-list@redhat.com
https://listman.redhat.com/mailman/listinfo/psyche-list