Re: Can I display Chinese character filenemes in an

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robin Rosenberg wrote:
On Monday 04 October 2004 18.35, James Richard Tyrer wrote:

Robin Rosenberg wrote:

On Monday 04 October 2004 04.56, James Richard Tyrer wrote:

Obviously, what I said is not Chinese specific.  It applies to any and
all UTF-8 encoded file names.  ISO-8859-1 is a subset of UTF-8 so Latin
characters will display just the same.

No. ASCII is a subset of UTF-8. ISO-8859-1 and UTF-8 are different and incompatible (or I'd would be using UTF-8 today).

I have: "LANG=en_us.utf8" and I have no problems. IIRC, that is what I have read at authoritative sources. But, do you mean that glyphs 128-255 are not the same in ISO-8859-1 and UTF-8? Perhaps there are some problems that I am not aware of since all I ever use (128-255) are Latin letters with diacritical marks. It does appear that odd combinations of characters could be interpreted as something other than ISO-8859-1.


ISO-8859-1 is both an encoding and a character set while UTF-8 is only and encoding for the unicode character set. The code points of these overlap at the first 256 posititions. When looked upon as encodings only the first 127 positions are identical. UTF-8 can encoding all characters in the ISO-8859-1 character set, but it does it differently. UTF-8 does this with a variable length encoding.

The filename "åäö" can be stored as the byte sequence [e5 e4 f6] when my locale is set to ISO-8859-1 or [c3 a5 c3 a4 c3 b6] when using UTF-8. I can't
have it both ways. The UTF-8 encoding shows up as "åäö" (unreadable garbage). In order to swith my locale from ISO-8859-1 to UTF-8 I have to convert my filenames as most non-ascii filename would be illegal in UTF-8 (not that many programs care). The others (non-ascii again) will look wrong.


Do "ls filenamewithdiacriticalmarks|od -tx1" and you'll see a variable length
encoding with one or two bytes depending on character (chinese characters are even longer). UTF-8 could require up to six bytes for one single character. I'm not sure if the unicode consortium has defined any such character yet.

I do note two things:

The first 256 glyphs of Unicode *are* the same as ISO8859-1.

It appears that KDE's clipboard converts to UTF-8 automatically.

--
JRT
___________________________________________________
.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.

[Index of Archives]     [Trinity (TDE) Desktop Users]     [Fedora KDE]     [Fedora Desktop]     [Linux Kernel]     [Gimp]     [GIMP for Windows]     [Gnome]     [Yosemite Hiking]
  Powered by Linux