Re: Improving Latin font selection for CJK locales

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi, Qianqian,
Latin digits are basically treated as "neutral" characters in a run oftext -- I think that is pretty much"standard Unicode operating procedure" if you look at how the digitsare categorized in UCD.
I don't know the internal details of how Pango itemizes a string oftext, but usingyour "pngsBGtUJxMgD.png" as an example, we can see what is most likelyoccurring: First, it appears that Pango treats  "1234A" as a run of "latn" textbecause of the presence of the letter "A" -- all characterspreceding the "A" are "neutrals" which presumably don't influence theitemizer, but of coursethe letter "A" tells the itemizer that the current run of text is Latin script.Then of course the "我" starts a new run of text which gets classified as Han("hani" if using the ISO 15924 code) script -- and the followingneutrals "123" remain a part of that2nd text segment. The final "ABC" however causes the itemizer to breakout a 3rd segment --and it is "latn".
Pango presumably then talks to fontconfig to get the font assignmentsfor each of the three segments.Behdad can confirm if this is in fact how the itemizer works or not.
So fixing this kind of "bug" or "feature" may require changing how theitemizer works.For example, what if digits were not categorized as "neutrals" butwere instead assigned their owncategory of "Latin Digits" ?
Then a text itemizer could break out "latin digits" into separate segments.
For a document with Latin script, maybe these "latin digit" segmentseventually get merged back intothe "latn" segments because it is not necessary to treat them anydifferently from how the "latn" segmentsare treated.
But if the main script is not Latin, then there may be some advantageto treating "latin digits" segments separately.
For example, it would allow your Chinese text to have latin digitsrendered in DejaVu Sans because the "latin digits" segments couldsimply be treated as another special kind of "latn" segment.
There might also be some benefit to doing this in Arabic texts sincethe "latin digits" and even the "Arabic digits" need to be rendered asruns of LTR text embedded in surrounding RTL text.
Of course there may be other issues and cases which I have not thoughtof yet, but this is not the first time that I have thought abouttreating segments of "latin digits" as some non-neutral category forthe purposes of enhanced itemization.
(I am actually currently working on writing some C++ UnicodeTextclasses of my own -- and just recently was playing around with theseissues of text itemization, so I am very interested to learn whatpeople *really* want to have).  Is it possible that what people reallywant may *differ* in some details from the status-quo standard Unicodepractices?
Best Wishes - Ed
>> the second point currently is not possible, because Pango labels the Common> scripts (digits) near Chinese text as Chinese, and in fontconfig, we never> know if it is a common-script or Chinese Hanzi. This caused porblems> like this:>>>> Seems to me that the proposed methods will still assign lang=zh for Common> scripts between Chinese Hanzi if locale=zh. So, it may still not likely> that we can force to use smooth Latin fonts for Common via fontconfig,> is my understanding correct?>>> >> >> --Pat>> >>>> _______________________________________________> Fontconfig mailing list> Fontconfig@xxxxxxxxxxxxxxxxxxxxx>>_______________________________________________Fontconfig mailing listFontconfig@xxxxxxxxxxxxxxxxxxxxxxxxx://

[Index of Archives]     [Fedora Fonts]     [Fedora Users]     [Fedora Cloud]     [Kernel]     [Fedora Packaging]     [Fedora Desktop]     [PAM]     [Gimp Graphics Editor]     [Yosemite News]

  Powered by Linux