On Tue, 23 May 2006, Martin Langhoff wrote: > > The dev machine where I am running the import is a slug! It's still > working on it, only gotten to 7700 commits, with the cvsimport process > stable at 28MB RAM and cvs stable at 4MB. I have to say, that cvsimport script really does do horrible things. It's basically a fork/exec/exit benchmark, as far as I can tell. Running oprofile on the thing, the top offenders are (ignore the 45% idle thing: it's just because this was run on a dual-cpu system, so since it's almost completely single-threaded you get ~50% idle by default). 3117654 45.8708 vmlinux vmlinux .power4_idle 802313 11.8046 vmlinux vmlinux .unmap_vmas 632913 9.3122 vmlinux vmlinux .copy_page_range 150359 2.2123 vmlinux vmlinux .release_pages 131330 1.9323 vmlinux vmlinux .vm_normal_page 117836 1.7337 libperl.so libperl.so (no symbols) 74098 1.0902 libgklayout.so libgklayout.so (no symbols) 54680 0.8045 vmlinux vmlinux .free_pages_and_swap_cache 54300 0.7989 libfb.so libfb.so (no symbols) 49052 0.7217 vmlinux vmlinux .copy_4K_page 46559 0.6850 libc-2.4.so libc-2.4.so getc 42677 0.6279 vmlinux vmlinux .page_remove_rmap 41133 0.6052 libc-2.4.so libc-2.4.so ferror .. those kernel functions are all about process create/exit, and COW faulting after the fork. Now, this is on ppc, so process creation is likely slower (idiotic PPC VM page table hashes), but Linux is actually very good at doing this, and the fact that process create/exit is so high is a very big sign that the script just ends up executing a _ton_ of small simple processes that do almost nothing. I wonder why those "git-update-index" calls seem to be (assuming I read the perl correctly) done only a few files at a time. We can do a hundreds in one go, but it seems to want to do just ten files or something at the same time. Although since most commits should hopefully just modify a couple of files, that probably isn't a big deal. That thing would probably be an order of magnitude faster if written to use the git library interfaces directly. Of course, the CVS part is probably a big overhead, so it might not help much (I would not be surprised at all if a number of the fork/exec/exit things are due to the CVS server starting RCS or something, not due to git-cvsimport itself) Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html