Since I reboot fairly regularly to test new kernels, I don't *always* have the kernel source tree in my caches, so I still care about some cold-cache performance. And "git grep" tends to be the most noticeable one. Now, I have a SSD, and even the cold-cache case takes just five seconds or so, but that's still somethng I react to, since the normal "kernel tree in cache" case ends up bring close enough to instantaneous (half a second) that then when it takes longer I react to it. And I started thinking about it, and our "git grep" parallelism seems to be limited to 8. Which is fine most of the time for CPU parallelism (although maybe some people with big machines would prefer higher numbers), but for IO parallelism I thought that maybe we'd like a higher number... So I tried it out, and with THREADS set to 32, I get a roughly 15% performance boost for the cold-cache case (the error bar is high enough to not give a very precise number, but I see it going from ~4.8-4.9s on my machine down to 4.2..4.6s). That's on an SSD, though - the performance implications might be very different for other use cases (NFS would likely prefer higher IO parallelism and might show bigger improvement, while a rotational disk might end up just thrashing more) Is this a big deal? Probably not. But I did react to how annoying it was to set the parallelism factor (recompile git with a new number). Wouldn't it be lovely if it was slightly smarter (something more akin to the index preloading that takes number of files into account) or at least allowed people to set the parallelism explicitly with a command line switch? Right now it disables the parallel grep entirely for UP, for example. Which makes perfect sense if grep is all about CPU use. But even UP might improve from parallel IO.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html