Re: ISO 15924 font selection

Gerrit Sangel <z0idberg@xxxxxx> · Tue, 4 Dec 2007 11:06:51 +0100



Am Dienstag 04 Dezember 2007 schrieben Sie:
> I also had Unicode scripts in mind, instead of ISO 15924, and I had a> user-readable version in mind, like "arabic" and "latin".  Pango already> has that information and it can be deduced from standard Unicode script> names.  Doesn't mean it can't be ISO 15924 names though, but the mapping> is not one to one, and I really don't understand why Fraktur is a> different script than Latin in there.  I don't think this feature if> added should be used for things like Fraktur.
Well, in my opinion the case with Fraktur is more or less the same as with the Han unification. Apart from the long s, Fraktur shares the same code points with normal latin, so it can’t really be guessed via the code points. It may only be different glyphs, but the appearance is (imho) way too different to just speak of a different style like serif or sans serif. They are used in a different way, as well. Foreign words are usually not written in Fraktur, so sometimes the script information has to be changed in the sentence.
Doing this via CSS would work, but it is not really flexible. The first thing is, as far as I know, that there is no real “standard” Fraktur font available, so the web designer could not just specify a certain font. He would have to specify several fonts in CSS, which I think would be a bit too much work. If he would just do it via a script tag, he could just define<p xml:lang="de-Latf">Das iſt Fraktur <span xml:lang="de">und das Antiqua</span></p>and let the user care about which font he wants to use.
But what are the benefits of Unicode scripts? Is there a list available? As the Unicode website states, the Unicode Consortium was appointed to manage ISO 15924. So I would have guessed that this is the “official” script list for Unicode.
> > > In my opinion, it is much more flexible than defining fonts according> > > to a specific region (e.g. TW or CN). In some cases, it is even> > > necessary, because the region does not differ.> >> > Yeah, conflicts among multiple scripts used for the same langauge in the> > same territory do exist, which fontconfig doesn't handle well at all.>> If we add script tags in excess to language tags, orthographies then can> be extended to tell what script is used in them.  Matching can skip if> script tags don't match.
Well, but why should script tags don’t match? I would guess (I’m no linguist) that you can express every language with every script, even though it may not be quite correct most of the time. So I don’t think that there should be a limitation.I think the main purpose of the script tags is that a script can be specified for a language which is usually not written with that script.
But the different iso standards would not conflict as far as I know. ISO 639 is written entirely in lowercase letters, ISO 3166 completely in uppercase and ISO 15924 has the first letter in uppercase, the other three in lowercase.And I guess the ordering would be from “biggest” to “lowest”, so language-region-script.
> > > Do I understand this correctly, that the user can specify a font in the> > > config file according to a specific language?> >> > You can match on the language and prepend a family name to make that> > preferred.> >> > > I see this in Firefox (even though it does not seem to use fontconfig,> > > but I guess an addon could be written to solve it)> >> > firefox does use fontconfig, although the language-based selection is> > internal, not based on modifying fontconfig matching rules.> >> > > So I think a possible way would be to define a general rule for a> > > language (according to ISO-639) or a script (ISO 15924) at first and> > > then a specific rule for a language or script which would override the> > > general rule.> >> > The pattern matching and editing rules should be able to handle this> > without change, execpt for the addition of ISO 15924 script codes to the> > existing set of language/territory pairs.>> Another piece of information that can improve language matching is to> use ISO 639-3 macrolanguage information.  That can fontconfig for> example that Dari is a Persian language for example:>>   http://bugzilla.gnome.org/show_bug.cgi?id=470907
Well, but this is for *languages*, not *scripts*. Another example would maybe this:I have a Japanese text I want to write in old characters in use before simplification after WW2. Although some old characters are encoded differently, some were unified because there are only minor stylistic differences. I would have to use a higher level protocol to define that these should be old characters. But the language itself does not differ. ISO 15924 has some tags for Han, namely Hani (Han ideographs), Hans (simplified Han), Hant (traditional Han). So I would define this old character as “ja-Hant” and the browser could select a font which has these old glyphs. In this case, you could not differentiate between a language and a region, because it is the same as modern Japanese. *Only* the script differs.
So I would really urge for ISO 15924. In my opinion, this is the best solution, becausea) an established standard existsb) it is conform with ISO 639 and 3166c) It is managed by the Unicode consortiumd) Why reinvent the wheel?
And I would not think, names like “arabic” or “latin‌” are that useful. First, because they explicitely aim towards english speakers, which especially in this case, I don’t like that much. Second, because the ISO 15924 tags are derived from more or less user readable names, and because they have 4 letters, they are still quite well to read. Arabic is Arab and Latin is Latn. Third, if the web designer already has to look which language/country code he needs, I don’t think it would be very exhausting.http://www.unicode.org/iso15924/iso15924-en.html

Gerrit_______________________________________________Fontconfig mailing listFontconfig@xxxxxxxxxxxxxxxxxxxxxxxxx://lists.freedesktop.org/mailman/listinfo/fontconfig