On Wed, 2007-03-14 at 10:11 -0700, Toshio Kuratomi wrote: > On Wed, 2007-03-14 at 19:45 +0900, Mamoru Tasaka wrote: > > Toshio Kuratomi wrote: > > > Hi all, > > > > > > I'm thinking of writing a draft guideline for the packaging committee to > > > mandate all filenames be in utf-8. > > > > This may be difficult when filename contains multibyte > > characters (such as Japanese Kanji characters), although > > I am not familiar with handling filenames with multibyte > > characters. > > > I was under the impression that utf-8 was capable of storing Kanji, > just not as efficiently as utf-16 or another encoding. (AIUI utf-8 uses > three bytes instead of two.) Am I missing something important here? UTF8 is a multipyte charset with a 1 byte base unit, IIRC it can go up to 4 or 6 bytes for a single character in some rare conditions, but IIRC. UTF8 is ASCII-7 compatible and null terminated. UTF16 is a multibyte charset but the base unit is 2 bytes long, it is not ASCII-7 compatible and is not null"byte" terminated (ascii chars translates into \00\XX with XX the actual ASCII code). Also UTF16 should be further divided in LE and BE (little Endian and Big Endian) depending on the byte order of the 2 byte base unit. Both, utf8 and utf16 are just representations of the Unicode standard, and in theory you should be able to translate from utf8 to utf16 and vice-versa with no loss of information. Then MS started using UCS2/UTF16 and ... well you can guess ... Simo. -- Fedora-maintainers mailing list Fedora-maintainers@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-maintainers -- Fedora-maintainers-readonly mailing list Fedora-maintainers-readonly@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-maintainers-readonly