Re: Improving Latin font selection for CJK locales

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi Ed,
If you are interested in the details, read the entire thread here:
I'm trying to avoid repeating the same reasoning again and again, andit's really not quite on topic on fontconfig list anyway.

On Tue, 2008-01-29 at 20:56 -0500, Ed Trager wrote:> Hi, Qianqian,> > Latin digits are basically treated as "neutral" characters in a run of> text -- I think that is pretty much> "standard Unicode operating procedure" if you look at how the digits> are categorized in UCD.> > I don't know the internal details of how Pango itemizes a string of> text, but using> your "pngsBGtUJxMgD.png" as an example, we can see what is most likely> occurring: First, it appears that Pango treats  "1234A" as a run of "latn" text> because of the presence of the letter "A" -- all characters> preceding the "A" are "neutrals" which presumably don't influence the> itemizer, but of course> the letter "A" tells the itemizer that the current run of text is Latin script.> Then of course the "我" starts a new run of text which gets classified as Han> ("hani" if using the ISO 15924 code) script -- and the following> neutrals "123" remain a part of that> 2nd text segment. The final "ABC" however causes the itemizer to break> out a 3rd segment --and it is "latn".> > Pango presumably then talks to fontconfig to get the font assignments> for each of the three segments.> Behdad can confirm if this is in fact how the itemizer works or not.> > So fixing this kind of "bug" or "feature" may require changing how the> itemizer works.> For example, what if digits were not categorized as "neutrals" but> were instead assigned their own> category of "Latin Digits" ?> > Then a text itemizer could break out "latin digits" into separate segments.> > For a document with Latin script, maybe these "latin digit" segments> eventually get merged back into> the "latn" segments because it is not necessary to treat them any> differently from how the "latn" segments> are treated.> > But if the main script is not Latin, then there may be some advantage> to treating "latin digits" segments separately.> > For example, it would allow your Chinese text to have latin digits> rendered in DejaVu Sans because the "latin digits" segments could> simply be treated as another special kind of "latn" segment.> > There might also be some benefit to doing this in Arabic texts since> the "latin digits" and even the "Arabic digits" need to be rendered as> runs of LTR text embedded in surrounding RTL text.> > Of course there may be other issues and cases which I have not thought> of yet, but this is not the first time that I have thought about> treating segments of "latin digits" as some non-neutral category for> the purposes of enhanced itemization.> > (I am actually currently working on writing some C++ UnicodeText> classes of my own -- and just recently was playing around with these> issues of text itemization, so I am very interested to learn what> people *really* want to have).  Is it possible that what people really> want may *differ* in some details from the status-quo standard Unicode> practices?> > Best Wishes - Ed> > >> > the second point currently is not possible, because Pango labels the Common> > scripts (digits) near Chinese text as Chinese, and in fontconfig, we never> > know if it is a common-script or Chinese Hanzi. This caused porblems> > like this:> >> >> >> > Seems to me that the proposed methods will still assign lang=zh for Common> > scripts between Chinese Hanzi if locale=zh. So, it may still not likely> > that we can force to use smooth Latin fonts for Common via fontconfig,> > is my understanding correct?> >> >> > >> > >> --Pat> >> > >>> >> > _______________________________________________> > Fontconfig mailing list> > Fontconfig@xxxxxxxxxxxxxxxxxxxxx> >> >-- behdad
"Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety."        -- Benjamin Franklin, 1759
_______________________________________________Fontconfig mailing listFontconfig@xxxxxxxxxxxxxxxxxxxxxxxxx://

[Index of Archives]     [Fedora Fonts]     [Fedora Users]     [Fedora Cloud]     [Kernel]     [Fedora Packaging]     [Fedora Desktop]     [PAM]     [Gimp Graphics Editor]     [Yosemite News]

  Powered by Linux