On Wed, Oct 07 2009, Nick Piggin wrote: > On Tue, Oct 06, 2009 at 02:49:41PM +0200, Jens Axboe wrote: > > On Tue, Oct 06 2009, Nick Piggin wrote: > > > It's a copout, but you could try running multiple dbenches under > > > different working directories (or actually, IIRC dbench does root > > > based path lookups so maybe that won't help you much). > > > > Yeah, it's hitting dentry->d_lock pretty hard so basically > > spin-serialized at that point. > > > > > > Anyway, below are the results. Seem very stable. > > > > > > > > throughput > > > > ------------------------------------------------ > > > > 2.6.32-rc3-git | 561.218 MB/sec > > > > 2.6.32-rc3-git+patch | 627.022 MB/sec > > > > > > Well it's good to see you got some improvement. > > > > Yes, it's an improvement though the results are still pretty abysmal :-) > > OK, I have a really basic patch that does store-free path walking > (except on the final element). dbench is pretty nasty still because > it seems to do a lot of stupid things like reading from /proc/mounts > all the time. > > Actually it isn't quite store-free because it still takes mnt ref > (which is hard to avoid, but at least it is a per-cpu data). So > this patch applies on top of my recent patchset. At least it does > not store to the dentries. > > Basically it is holding rcu_read_lock for the entire walk, it uses > inode RCU from my earlier patches to check inode permissions, and > it bails out at the slightest sign of trouble :) (actually I think > I have got it to walk mountpoints which is nice). > > The seqlock in the dentry is for getting consistent name,len pointer, > and also not giving a false positive if a rename has partially > overwritten the name string (false negatives are always fine in the > lock free lookup path but false positives are not). Possibly we > could make do with a per-sb seqlock for this (or just rename_lock). > > I have duplicated most of the lookup code, altering it to the lock > free version, which returns -EAGAIN if it gets in trouble and the > regular path walk starts up. I don't know if this is the best way > to go... it's fairly easy but a bit ugly. But complexifying the > existing path walk any more would not be nice either. It might be > nice to try locking the dentry that we're having trouble with and > continuing from there, rather than redoing the whole walk with > locks... > > Anyway, this is the basics working for now, microbenchmark shows > same-cwd lookups scale linearly now too. We can probably slowly > tackle more cases if they come up as being important, simply by > auditing filesystems etc. throughput ------------------------------------------------ 2.6.32-rc3-git | 561.218 MB/sec 2.6.32-rc3-git+patch | 627.022 MB/sec 2.6.32-rc3-git+patch+inc| 969.761 MB/sec So better, quite a bit too. Latencies are not listed here, but they are also a lot better. Perf top still shows ~95% spinlock time. I did a shorter run (the above are full 600 second runs) of 60s with profiling and the full 64 clients, this time using -a as well (which generated 9.4GB of trace data!). The top is now: _spin_lock (92%) path_get (39%) d_path (59%) path_init (26%) path_walk (13%) dput (37%) path_put (86%) link_path_walk (13%) __d_path (23%) And finally, this: > + if (!nd->dentry->d_inode) { > + spin_unlock(&nd->path.dentry->d_lock); > + return -EAGAIN; > + } doesn't compile, it wants to be if (!nd->path.dentry->d_inode) { -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html