Re: MD5 checksum? Why? & mmap alignment on same machine in 32.v.64 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Raimund Steger wrote:

 I think you'd have a hard time even measuring the impact of MD5 in
 fontconfig.  I just tried with my profiler and it only shows up as 0,
 where the whole caching process is usually well in the seconds.
---
Caching 32+64 bit on my machine measure in 540-860 x 2 ~= 1400 seconds,
or easily over 100 times longer.   @ ~6000-9000 fonts, doing it once is
painful.  Doing it twice seems a bit masochistic or sadistic depending on
what end you're on... ;-/  If I get caught doing it on windows, it takes
about 25-30% more (cygwin on windows, not a recent timing, as I uninstalled
the problematic SW). I am trying to force keeping 32-bit GUI software off of my machine because of this, and it's till unacceptable, I try not to use cygwin's native-X clients because of it. My machines are a few years old (3.4GHzx6 on Win, 2.8GHzx12 on Linux).

X11 doesn't take long, and neither does windows (nor does it need two separate versions of it's font dir on a bi-arch machine) Granted, the fonts on linux have better
coverage, but not by 100x (my linux and x11 GUI's mostly use the same font
directories (or copies of them). I mostly try to keep to a minimum of arch-specific fonts (like the 75/100dpi fonts.. stick to TTF/OTF/TTC fonts on both, but that's 6000-9000 figure comes from my active font list on windows. Under X, I prune out the alt-lang/charset aliases for the fonts, usually only accepting utf-8 compat(10646-1) and iso-8559-1/15 for compat
for non unicode font use.

Simply including the code is wasteful and less likely to keep code in the memory cache when
executing..


 it's fed are a few bytes of pathnames, that's not surprising. To create
 a bunch of not-too-long filenames from pathnames, that have a good chance
 of not hitting an existing one, I imagine that many other systems (e. g.,
HTTP caches...) use something very similar.
---
Not even close: squid's cache system levels are configurable as well as having a few
plugin-cache systems, but on my system:

2 levels of dirs numbered in hex:

   /var/cache/squid/{00..3F}/{00..3F}/

64 dirs in each giving level 4096 cache-dirs that only take 4096 bytes/dir (only 1 read/dir) using an 8-digit hex string giving a 32-bit filename space + 12-bits for the dir(s) about a 16Tera-name file space (it's currently using about 445K files taking a total of 87G of space).


(with 444093 files spread over the 4096 directories -- not optimal for font usage, but
good for squid's random access needs.

Compared to my linux fonts dir of 7.9G with two largest dirs being:

2.0G    /usr/share/fonts/OTF
5.2G    /usr/share/fonts/TTF
----    -----
7.9G    TOTAL

squid starts up in a few seconds normally, but has radically different needs than
the font lib, so it's startup time isn't comparable, BUT the point was it
does manage ALOT of unique filenames w/o a need for anything complicated.





 About the 32 vs. 64 bit issue, and leaving API considerations aside,
 doesn't fontconfig's serialization format use intptr_t sized offsets? If
 yes, I think it's not smart to cast these to non-native sizes.


I'd agree and on further examination, I see it wouldn't be
an identical format, but would be either adaptable in 64-bit mode
(not ideal), or easily convertible.
Either way, if the data is limited to <4G (which seems uncertain or
unlikely), given I have something like 8G of fonts NOW, and in 5-10
years, that size DB MIGHT be considered small.   If 32-bit needs
 4G data sizes, that's already a problem (not that the in memory
sizes needs to reflect exactly what is on disk, but if it does and
what is on disk is > 4G, some measures could conceivably get
tight during the lifetime of this format.



I thought the first and primary data object was representative of the included data. It shows the way (fr. http://freedesktop.org/software/fontconfig/fontconfig-devel/x31.html)


An FcValue object holds a single value with one of a number of different types. The 'type' tag indicates which member is valid.

       typedef struct _FcValue {
               FcType type;
               union {
                       const FcChar8 *s;
                       int i;
                       FcBool b;
                       double d;
                       const FcMatrix *m;
                       const FcCharSet *c;
           void *f;
           const FcLangSet *l;
               } u;
       } FcValue;


The union is the key -- since it's the size of the longest value, it's 8-bytes
on both 32 and 64 bit archs.

If the pointers in the other structs were in unions w/a double,
(or an 8-byte character string), those would all be compatible as well.

But getting both archs for the price of 1 is only a small part of the problem
with a ~10+ minute rebuild time.


_______________________________________________
Fontconfig mailing list
Fontconfig@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/fontconfig




[Index of Archives]     [Fedora Fonts]     [Fedora Users]     [Fedora Cloud]     [Kernel]     [Fedora Packaging]     [Fedora Desktop]     [PAM]     [Gimp Graphics Editor]     [Yosemite News]

  Powered by Linux