On Wed, Sep 12, 2007 at 09:29:45AM +0200, Harald Hoyer wrote: > Jakub Jelinek schrieb: > >BTW, each read_depends below will scan the whole modules.dep (if > >unsuccessful) > >or first part thereof. In the modprobes that take most of the time, is > >modules.dep read just once or multiple times? > > Hmm, don't know exactly what you mean... attached are the times. > > modalias;real;user;sys > > First modprobe takes the longest, as it fills the cache. E.g. strace /sbin/modprobe pci:v0000104Cd0000803Bsv0000104Dsd000081EFbc01sc80i00 shows open("/lib/modules/2.6.22.1-27.fc7/modules.dep", O_RDONLY) = 3 + ~ 65 x 4KB read + parsing it in userspace open("/lib/modules/2.6.22.1-27.fc7/modules.alias", O_RDONLY) = 3 + ~ 65 x 4KB read + parsing it in userspace open("/lib/modules/2.6.22.1-27.fc7/modules.dep", O_RDONLY) = 3 + ~ 40 x 4KB read + parsing it in userspace While both the big files are probably cached already, even the reads from cache and especially parsing can mean significant time spent in it. I have ran 50 times /sbin/modprobe pci:v0000104Cd0000803Bsv0000104Dsd000081EFbc01sc80i00 under oprofile, 37.9% is spent in the kernel which I didn't have debuginfo for. But then we have: 0000003a628672a0 7031 24.6745 libc-2.6.so fgetc 0000000000402370 2974 10.4369 modprobe getline_wrapped 0000003a62877580 1299 4.5587 libc-2.6.so strpbrk 0000000000401310 871 3.0567 modprobe .plt 0000000000402d70 808 2.8356 modprobe underscores 0000003a628778c0 567 1.9898 libc-2.6.so strspn 0000003a62876a20 492 1.7266 libc-2.6.so strchr 0000003a6286e360 441 1.5476 libc-2.6.so malloc_consolidate 0000003a628705b0 396 1.3897 libc-2.6.so _int_malloc static int fgetc_wrapped(FILE *file, unsigned int *linenum) { for (;;) { int ch = fgetc(file); if (ch != '\\') return ch; ch = fgetc(file); if (ch != '\n') return ch; if (linenum) (*linenum)++; } } static char *getline_wrapped(FILE *file, unsigned int *linenum) { int size = 1024; int i = 0; char *buf = NOFAIL(malloc(size)); for(;;) { int ch = fgetc_wrapped(file, linenum); if (i == size) { size *= 2; buf = NOFAIL(realloc(buf, size)); } if (ch < 0 && i == 0) { free(buf); return NULL; } if (ch < 0 || ch == '\n') { if (linenum) (*linenum)++; buf[i] = '\0'; return NOFAIL(realloc(buf, i+1)); } buf[i++] = ch; } } Even just replacing fgetc with fgetc_unlocked should help tremendously, fgetc needs to lock the stream, so 2 atomic instructions per character plus it is a library call. fgetc_unlocked is inlined, needs no locking (modprobe isn't threaded, right?) and if there is anything buffered already (4095 out of 4096 times) it is just dereferencing a few pointers. The malloc/realloc/free per each line in the file (in my case 1486 malloc/frees times 2 for modules.dep and 5872 for modules.alias can be also avoided, from what I see all users of getline_wrapped always free what it returned and it is never called recursively. Which means there is no reason to malloc/free this every time. If instead this does: static int size = 0; static char *buf = NULL; ... if (i == size) { size = size ? size * 2 : 1024; buf = NOFAIL(realloc(buf, size)); } ... return buf; /* No second realloc. */ and never frees what getline_wrapped returned, you can always use the same buffer. Still, it is processing a significant amount of data, so while the above two proposed simple changes could give 10% or so, they won't kill the cost altogether. There are other inefficiencies all over, like: len = strlen(modname); if (strchr(modname, '.')) len = strchr(modname, '.') - modname; which gcc is at least ATM not able to optimize (while the duplicate strchr is called just once, gcc will call strlen unconditionally, even when it should be only called if strchr failed - though best is just use len = strchrnul(modname, '.') - modname; here. With a prepared binary blob which would include a hash table for easy finding of dependencies for a module we could just open that, and only once. If the binary blob starts with a magic number and then has st_ino/st_dev/st_ctime of both modules.dep and modules.aliases in the same dir, then you can easily keep the text files for handediting purposes and only use the binary blob if the text files haven't changed since the binary blob was recreated, and let the depmod or whatever creates the binary files also create the binary blob and have a mode in which it creates the blob from the text files rather than from whatever it found on disk. modules.dep searches just for the basenames of modules without extension and dir component, and ignores differences between - and _. So, just create a hash function which handles - and _ the same and gives good results on the usual module names and stick a hash table (e.g. with chains) after the file header, then the chains would contain the actual module name's basename for module_equal comparison, full 32-bit hash and pointer to where the full strings are found in the string table (which should probably be compressed using a directory table). module.aliases is harder, because it contains wildcards, so hash table isn't useful, but some kind of tree, where each node would contain a length, sorted prefixes of that length plus leaf nodes that contain wildcards at that point already and thus need to be tested all by fnmatch. Searching would just in each node test all the wildcardish leaf nodes, if none matched binary search for what matches the prefix and recurse into that node. E.g. root node could have length of minimum of smallest alias name and smallest number of chars before first wildcard in any alias name (unless that is zero, otherwise the alias names starting with wildcard would go into the set needed to go through fnmatch at that level). Jakub -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list