Re: UTF-8 package names

Peter Gordon <peter@xxxxxxxxxxxxxxxx> · Tue, 26 Feb 2008 20:05:01 -0800

On Tue, 2008-02-26 at 18:52 -0800, Toshio Kuratomi wrote:
> You do a wonderful job of explaining what's wrong with us trying to 
> adjust upstream's name to be ASCII but I just want to be certain we're 
> on the same page by the end:
> 
> Package names should follow upstream since attempting to transliterate 
> or translate upstream names can't be done sanely on our side.  For 
> things that map easily into the ASCii set (diacritic/accented 
> characters, for instance, as found in latin-1) a transliterated Provides 
> can be added to make installation easier for ASCii-conditioned users but 
> carrying this on to other scripts is a losing proposition.

That's exactly what I was trying to explain, yes. Sorry for the rambling
if it seemed that way.

Rethinking this though, I feel that accented and diacritic-related
characters (á, ë, õ, ñ, æ, et al.) would be quite suitable but we would
have to pay attention to the language of origin. For our example here,
the transliterations would probably be simply removing the diacritic as
appropriate: a, e, o, n, and ae.

In some languages, though, the diacritic differentiates the character
from the "plain" form. For example, a Spanish package name for a similar
studying software could be "¡Estudiará!" (third-person indicative future
tense; literally, "You will study!"). However, we would need to be
careful here because, without that accent, this changes the conjugation
to "estudiara," which is the first- and third-person imperfect
subjunctive (which really makes no sense on its own, since the
subjunctive tense is meant to be used in a subjective or predictive
clause of a sentence, such as referring to one's wants and desires for
the future).

Also, transliterating ñ if we know the source language to be Spanish (or
other similar languages) would change it a bit, since it's actually
pronounced as "ny-": "Ño" would be pronounced as "nyo"; "ñat" would be
"nyat" and so on. (Not that these are actual words, but they are parts
thereof.) If we allowed accents and whatnot, maybe we should restrict
the transliteration to simply removing all diacritics and splitting
"merged" letters (æ --> ae, etc.). That way we would not have to worry
too much more.

(Again, using Spanish as an example due to familiarity.)

However, trying to force things into ASCII or even some variation of
roman-alphabet text is going to be extremely difficult and quite
confusing for languages for which that is not the norm.

I just though of another example for my previous message. (Jee, I seem
to love these. Apologies...) What happens when the package name is only
Han characters? Do we transliterate it to the Japanese (Kanji) reading,
or the Korean (Hanja) reading, or the native Chinese reading? :P

But I digress. Thanks for your time.
-- 
Peter Gordon (codergeek42)
GnuPG Public Key ID: 0xFFC19479 / Fingerprint:
  DD68 A414 56BD 6368 D957 9666 4268 CB7A FFC1 9479

Attachment:
signature.asc

Description: This is a digitally signed message part
--
Fedora-packaging mailing list
Fedora-packaging@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-packaging