"Contains" matching issues. The contains operator is currently used in font listing and can be used in match/edit rules. LISTING FONTS When listing fonts, contains should have "obvious" semantics, I suggest that those semantics depend on the type of the value: string, number, boolean: font has an equal value for every value in the pattern. This means that using 'times,courier' for the family will result in no fonts being listed as no font has both times and courier family names. In fact, I can't see a good use for multiple values here as it would require multiple values in the fonts; let's see if that is broken. For strings, the change here is that 'contains' does not mean sub string -- list 'courier' and you won't see 'courier 10 pitch'. I think strings should be treated as atomic values in this context; fontconfig doesn't have string operators, which is at least consistent. charset: font contains listed Unicode codepoints, in otherwords, the charset provided by the font 'contains' all of the glyphs requested by the application. lang: (Remember that 'lang' is a composite value consisting of a language value and a territory value. The list of lang values in a font is computed from Unicode coverage ranges based on orthographies. Except for Chinese, all of these coverage ranges are (currently) assocated only with a language and not a territory. Chinese is (currently) split into three territory groups (mainland China and Singapore, Hong Kong, Taiwan and Macau). So, most language comparisons will be done with a language/territory pair supplied by the application (often from the current locale) against fonts which know only languages and not territories. However, applications will also provide only languages at times to be matched against fonts which have languages and territories.) The font supports all of the langs requested by the application. I think this means that the font 'contains' all of the langs requested by the application (remember, we're talking about LISTING here). Now, the tricky part of defining what 'support' means for a specific lang entry. When the application provides a language/territory pair, then the font must either provide a matching language/territory pair, or a bare language entry. When the application provides a bare language, the font must either provide a matching bare language entry or a language/territory pair with *any* territory: application font "supports" ----------- ---- ---------- zh zh_cn YES zh_tw zh_cn NO en_gb en YES en en YES MATCHING The LISTING algorithm is designed to sharply restrict the set of provided fonts; an empty list is often the result of overspecified patterns; that matches the expected usage of providing precise information to users about what actual fonts are available, rather than what font will be used when a specific pattern is matched. In contrast, MATCHING is designed to always provide a font, and in fact to provide a score measuring how accurate that match is so that the set of available fonts can be sorted by this metric and returned to the application. When matching fonts, we're not using the boolean 'contains' operators, but rather measuring distance from the pattern to the font (in CS terms, LISTING is a constraint satsifaction problem while MATCHING is an constraint optimization problem) string, boolean: Distance in these objects is measured with only two values -- matching and nonmatching -- matching strings or booleans have distance 0 while mismatching values have distance 1. number: Distance between two numbers is just the absolute value of thier difference (the obvious value). This is used for things like weight and slant, the numeric values for those constants was carefully chosen to prefer reasonable substitutions (italic and oblique and closer together than either is to roman). charset: Distance between two charsets is the count of characters requested by the pattern but not provided by the font. This means that a font which fully covers the requested characters has distance '0'. lang: Distance has three values: 0: pattern and font have equal language/country, or pattern has only language and font has language with any country. 1: Pattern and font have equal language and different country (zh_CN vs zh_TW) 2: Pattern and font have different language EDITING The EDITING algorithm needs a method for matching patterns for each edit operation; this is another constraint satisfaction problem as the edit rules are either applied or not applied. Match rules in edit instructions can use many different operators to constrain pattern selection: eq not_eq less less_eq more more_eq contains not_contains Each of these opeators behave differently for each datatype. For datatypes which aren't ordered, I've defined the ordered operators to always return false. string: I think these should be treated as unordered objects so that collation isn't visible to the user. The remaining question is whether the 'contains' operator should be used to detect sub-string presense. The LISTING operation above should not do this as the operator is not selectable, but allowing 'contains' to do substring detection in an EDITING context means that LISTING won't use Contains, but rather some Contains-like analog which is actuall Equal for strings. Hmm. Permitting Contains for EDITING would probably be useful, especially for FC_STYLE pattern elements. boolean, number: These have obvious semantics for all of the operators if contains/not_contains are allowed to be synonyms for eq/not_eq. charset, lang: I think the semantics described above for LISTING should apply here. PROPOSED CHANGES I believe the only changes necessary to implement these semantics are: 1) Use a Contains-alike operator for LISTING which does exact matching for strings, permit Contains for EDITING to do substring matching 2) Change lang Contains semantics to make ll_xx contain ll and ll contain ll_xx (currently, I believe ll_xx does not contain ll)