I worked up the last two patches [1,2] on the road toward understanding fontconfig's view of charsets, with the goal being: Which installed fonts contain code point 0xXXXX? Now I understand the (base-code-point, bitmap) structure (as documented in [2]), and I can use this: $ fc-list -v 'URW Chancery L:style=Medium Italic' … charset: 0000: 00000000 ffffffff ffffffff 7fffffff 00000000 ffffffff ffffffff ffffffff 0001: ffffffff ffffffff fffff3ff ffffffff 00040000 00000000 00000000 00000000 0002: 03000000 00000000 00000000 00000000 00000000 00000000 3f0002c0 00000000 0003: 00000000 00000000 00000000 00000000 00100000 10000000 00000000 00000000 0004: ffffffff ffffffff ffffffff 00000000 00000000 0c00c000 faff0007 033ffffc 0020: 77180000 06010047 00000010 00000000 00000000 00001000 00000000 00000000 0021: 00400000 00000004 00000000 00000000 00000000 00000000 00000000 00000000 0022: 46260044 00000000 00000000 00000031 00000000 00000000 00000000 00000000 0025: 00000000 00000000 00000000 00000000 00000000 00000000 00000400 00000000 00f6: 00000000 00000000 00000000 00000000 00000000 00000000 000001f8 00000000 00fb: 00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (s) However, I'm still stuck on the base-85 formatting for the user-facing charsets (and I'm not alone: [3]): $ fc-list 'URW Chancery L:style=Medium Italic' charset :charset= |>^1!|>^1!P0oWQ |>^1!|>^1!|>^1!!!!%#|>^1!|>^1!|>]fs|>^1!!!K?& !!!)$!{{B% 9;*l$ !!!.% !#f05(1+e5 !!!1&|>^1!|>^1!|>^1! %rw)IzbyU$#%lqi!!#0GM>RAd#y#fx!!!!5 !!!W5 !!#3H!)pSj!!!!& !!#6I<UG/) !!!!X !!#AL !!!1& !!+fv !!!(y !!+u{!!!!) Is code point 0x2202 in the first? Yes: * 0x2202 / 0xff = 0x22, so it's in the "0022:" row, with a remainder of 0x2202 & 0xff = 0x02 * 0x02 / 32 = 0, so it's in the first block (map[0] = 0x46260044), with a remainder of 0x02 % 32 = 2 * 2 / 0xf = 0, so it's in the least significant digit of the block (map[0] & 0xf = 4), with a remainder of 2 % 0xf = 2 * The remainder-2 entry is the third bit (2+1) in the digit, because the remainder-0 entry gets the first bit. The third bit is in the 4s column, and that's set in the digit 4 ;). To do the same with the second format, I had to fiddle with the valueToChar and Python to determine that 0x2200 is 0:0:0x1:0x11:0x22 in base 85, which should be represented by '!!#6I'. The next five characters are '<UG/)', which decodes to 0x16:0x2e:0x20:0xa:0x6 in base 85, which is indeed 0x46260044. I don't think saving three characters (37.5%) is worth the hassle of learning a fontconfig-specific set of digits for base 85. If I convert the parse/unparse code in fccharset.c to use hex, would that be mergable? The only problem I can see would be for folks scripting fc-list that had already written parsers for the current format (a null set?). Alternatively, perhaps there is another way to lookup fonts containing a character, and I've just missed it. In that case I don't care how ugly the charset serialization is :p. Cheers, Trevor [1]: http://thread.gmane.org/gmane.comp.fonts.fontconfig/4914 [2]: http://thread.gmane.org/gmane.comp.fonts.fontconfig/4915 [3]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498039#5 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Fontconfig mailing list Fontconfig@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/fontconfig