On 1 May 2013, Behdad Esfahbod verbalised: > On 13-05-01 05:29 PM, Nick Alcock wrote: >> >> Under the circumstances, it is unfortunate that _FcStrRegexCmp() is called so >> very often by fontconfig. On my system (with ~5000 fonts, and a stripped-down >> fonts.conf), mapping a new Emacs frame makes twelve calls to FcFontMatch()... >> and those twelve calls proceed to call _FcStrRegexCmp() 175171 times, with a >> call to regcomp() every time! This then proceeds to invoke malloc() on the >> order of three million times. It is fairly easy to turn this into a >> pathological slowdown at runtime. My Emacsen are not small (arena sizes of a >> gigabyte-plus are common, with a highly fragmented heap), and in that situation >> those three million malloc() and free() calls can easily consume in excess of >> fifty seconds (though on a fresh startup it takes 'only' a quarter of a second >> or so). > > It's really strange that that function is called at all. Can you figure out > why that is? Does your configuration have anything involving the "file" element? My config is almost totally the same as upstream, but with only these conf.d files listed: 10-autohint.conf 20-unhint-small-dejavu-experimental.conf 20-unhint-small-dejavu-lgc.conf 20-unhint-small-dejavu.conf 20-unhint-small-vera.conf 25-unhint-nonlatin.conf 30-metric-aliases.conf 30-urw-aliases.conf 40-nonlatin.conf 45-latin.conf 49-sansserif.conf 50-user.conf 51-local.conf 57-dejavu.conf 58-dejavu-lgc.conf 60-latin.conf 61-dejavu-experimental.conf 65-fonts-persian.conf 65-nonlatin.conf 69-unifont.conf 70-yes-bitmaps.conf 80-delicious.conf 90-synthetic.conf The call stack of all these regex calls is the same: FcFontMatch FcFontSetMatchInternal FcCompare FcCompareValueList FcCompareFilename FcStrRegexCmp I'm sure some FC_DEBUG value can give useful output to say what value list this is coming from, but the only one I've found that gives anything (FC_DEBUG=2) just gives huge sprays of stuff that's useless for this purpose :/ > Akira, regardless, I think we should remove the Regex and replace it with Glob > matching that is already in fccfg.c. > > I know you want to extend regex to other elements, but for files I think globs > are just fine. Quite. Regexes are more powerful in general, but the prevalence of dots in filenames suggests that they're the wrong tool here (certainly if you use a filename as a regex and don't regex-escape the filenames first). Even for other things, I think you need *some* sort of compiled-regex cache (a small LRU cache, or something?) to try to prevent insane sprays of regcomp() calls causing massive performance degradation. Perhaps a simple static variable as here is not ideal, since it appears as a false leak in valgrind, but... *something*. glibc regcomp() is much more expensive now than it used to be before regex understood UTF-8, so the old tradeoffs don't quite apply. (Sizing the cache might be interesting. I suspect a one-item cache will only work in cases that shouldn't be using regex at all, like this one. Perhaps tracking the percentage of cache misses and increasing the cache size as long as misses continue, up to some sanity bound? Your regexes should either be coming from the font info, which is effectively fixed, or the config, which is effectively fixed, so in the end all your regexes should be regexes you've used before as long as the cache is big enough. Recompiling them repeatedly is just a waste of time in that situation, unless the regexec() is *very* infrequent.) _______________________________________________ Fontconfig mailing list Fontconfig@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/fontconfig