On Thu, Oct 08 2009, Nick Piggin wrote: > On Thu, Oct 08, 2009 at 02:57:46PM +0200, Jens Axboe wrote: > > On Thu, Oct 08 2009, Nick Piggin wrote: > > > On Wed, Oct 07, 2009 at 07:56:33AM -0700, Linus Torvalds wrote: > > > > On Wed, 7 Oct 2009, Nick Piggin wrote: > > > > > > > > > > OK, I have a really basic patch that does store-free path walking > > > > > (except on the final element). > > > > > > > > Yay! > > > > > > > > > dbench is pretty nasty still because it seems to do a lot of stupid > > > > > things like reading from /proc/mounts all the time. > > > > > > > > You should largely forget about dbench, it can certainly be a useful > > > > benchmark, but at the same time it's certainly not a _meaningful_ one. > > > > There are better things to try. > > > > > > OK, here's one you might find interesting. It is a cached git diff > > > workload in a linux kernel tree. I actually ran it in a loop 100 > > > times in order to get some reasonable sample sizes, then I ran > > > parallel and serial configs (PreloadIndex = true/false). Compared > > > plain kernel with all vfs patches to now. > > > > > > 2.6.32-rc3 serial > > > 5.35user 7.12system 0:12.47elapsed 100%CPU > > > > > > 2.6.32-rc3 parallel > > > 5.79user 17.69system 0:09.41elapsed 249%CPU > > > > > > vfs serial > > > 5.30user 5.62system 0:10.92elapsed 100%CPU > > > > > > vfs parallel > > > 4.86user 0.68system 0:06.82elapsed 81%CPU > > > > Since the box was booted anyway, I tried the git test too. Results are > > with 2.6.32-rc3 serial being the baseline 1.00 scores, smaller than 1.00 > > are faster and vice versa. > > > > 2.6.32-rc3 serial > > real 1.00 > > user 1.00 > > sys 1.00 > > > > 2.6.32-rc3 parallel > > real 0.80 > > user 0.83 > > sys 8.38 > > > > sys time, auch... > > > > vfs serial > > real 0.86 > > user 0.93 > > sys 0.84 > > This is actualy nice too. My tests were on a 2s8c Barcelona system, > but this is showing we have a nice serial win on Nehalem as well. > Actually K8 CPUs have a bit faster lock primitives than earlier > Intel CPUs I think (closer to Nehalem), so we might see an even > bigger win with a Core2. Yes, this is just as interesting as the parallel results imho. I don't have a core 2 test box, so I cannot test that. > > vfs parallel > > real 0.43 > > user 0.72 > > sys 0.13 > > > > Let me know if you want profiles or anything like that. I'd say that > > looks veeeery tasty. > > It doesn't look all that different to mine, so profiles probably > not required at this point. Is the CPU accounting going wrong? It > looks like thread times are not being accumulated back properly, > which IIRC they should be. But 'real' time should be accurate, so > it is going a lot faster. Yes, it looks very similar, the higher CPU count just makes the parallel git preload on -rc3 stock look even more crappy (you had roughly 2x number of sys time, I have roughly 8x) when compared to the serialized approach. IIRC, there was a bug with thread accounting very recently. Why would it not hit -rc3 alone, though? Does look fishy, though. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html