Mike Waychison a écrit : > We've noticed that at times it can become very easy to have a system begin to > livelock on dcache_lock/inode_lock (specifically in atomic_dec_and_lock()) when > a lot of dentries are getting finalized at the same time (massive delete and > large fdtable destructions are two paths I've seen cause problems). > > This patchset is an attempt to try and reduce the locking overheads associated > with final dput() and final iput(). This is done by batching dentries and > inodes into per-process queues and processing them in 'parallel' to consolidate > some of the locking. > > Besides various workload testing, I threw together a load (at the end of this > email) that causes massive fdtables (50K sockets by default) to get destroyed > on each cpu in the system. It also populates the dcache for procfs on those > tasks for good measure. Comparing lock_stat results (hardware is a Sun x4600 > M2 populated with 8 4-core 2.3GHz packages (32 CPUs) + 128GiB RAM): > Hello Mike Seems quite a large/intrusive infrastructure for a well known problem. I even wasted some time on it. But it seems nobody cared too much or people were too busy. https://kerneltrap.org/mailarchive/linux-netdev/2008/12/11/4397594 (patch 6 should be discarded as followups show it was wrong [PATH 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU) sockets / pipes dont need dcache_lock or inode_lock at all, I am sure Google machines also uses sockets :) Your test/bench program is quite biased (populating dcache for procfs, using 50k filedesc on 32 cpu, not very realistic IMHO). I had a workload with processes using 1.000.000 file descriptors, (mainly sockets) and got some latency problems when they had to exit(). This problem was addressed by one cond_resched() added in close_files() (commit 944be0b224724fcbf63c3a3fe3a5478c325a6547 ) > BEFORE PATCHSET: > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > dcache_lock: 5666485 10086172 11.16 2274036.47 3821690090.32 11042789 14926411 2.78 42138.28 65887138.46 > ----------- > dcache_lock 3016578 [<ffffffff810bd7e7>] d_alloc+0x190/0x1e1 > dcache_lock 1844569 [<ffffffff810bcdfc>] d_instantiate+0x2c/0x51 > dcache_lock 4223784 [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c > dcache_lock 207544 [<ffffffff810bc36a>] d_rehash+0x1b/0x43 > ----------- > dcache_lock 2305449 [<ffffffff810bcdfc>] d_instantiate+0x2c/0x51 > dcache_lock 1954160 [<ffffffff810bd7e7>] d_alloc+0x190/0x1e1 > dcache_lock 4169163 [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c > dcache_lock 866247 [<ffffffff810bc36a>] d_rehash+0x1b/0x43 > > ............................................................................................................................................................................................... > > inode_lock: 202813 203064 12.91 376655.85 37696597.26 5136095 8001156 3.37 15142.77 5033144.13 > ---------- > inode_lock 20192 [<ffffffff810bfdbc>] new_inode+0x27/0x63 > inode_lock 1 [<ffffffff810c6f58>] sync_inode+0x19/0x39 > inode_lock 5 [<ffffffff810c76b5>] __mark_inode_dirty+0xe2/0x165 > inode_lock 2 [<ffffffff810c6e92>] __writeback_single_inode+0x1bf/0x26c > ---------- > inode_lock 20198 [<ffffffff810bfdbc>] new_inode+0x27/0x63 > inode_lock 1 [<ffffffff810c76b5>] __mark_inode_dirty+0xe2/0x165 > inode_lock 5 [<ffffffff810c70fc>] generic_sync_sb_inodes+0x33/0x31a > inode_lock 165016 [<ffffffff8119bd88>] _atomic_dec_and_lock+0x3c/0x5c > > ............................................................................................................................................................................................... > > &sbsec->isec_lock: 34428 34431 16.77 67593.54 3780131.15 1415432 3200261 4.06 13357.96 1180726.21 > > > > > > > AFTER PATCHSET [Note that inode_lock doesn't even show up anymore]: > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > dcache_lock: 2732103 5254824 11.04 1314926.88 2187466928.10 5551173 9751332 2.43 78012.24 42830806.29 > ----------- > dcache_lock 3015590 [<ffffffff810bdb04>] d_alloc+0x190/0x1e1 > dcache_lock 1966978 [<ffffffff810bd118>] d_instantiate+0x2c/0x51 > dcache_lock 258657 [<ffffffff810bc3cd>] d_rehash+0x1b/0x43 > dcache_lock 30 [<ffffffff810b7ad4>] __link_path_walk+0x42d/0x665 > ----------- > dcache_lock 2493589 [<ffffffff810bd118>] d_instantiate+0x2c/0x51 > dcache_lock 1862921 [<ffffffff810bdb04>] d_alloc+0x190/0x1e1 > dcache_lock 882054 [<ffffffff810bc3cd>] d_rehash+0x1b/0x43 > dcache_lock 15504 [<ffffffff810bc5e1>] process_postponed_dentries+0x3e/0x29f > > ............................................................................................................................................................................................... > > > ChangeLog: > [v1] - Jan 16, 2009 > - Initial version > > Patches in series: > > 1) Deferred batching of dput() > 2) Parallel dput() > 3) Deferred batching of iput() > 4) Fixing iput called from put_super path > 5) Parallelize iput() > 6) hugetlbfs drop_inode update > 7) Make drop_caches flush pending dput()s and iput()s > 8) Make the sync path drain dentries and inodes > > Caveats: > > * It's not clear to me how to make 'sync(2)' do what it should. Currently we > flush pending dputs and iputs before writing anything out, but presumably, > there may be hidden iputs in the do_sync path that really ought to be > synchronous. > * I don't like the changes in patch 4 and it should probably be changed to > have the ->put_super callback paths call a synchronous version of iput() > (perhaps sync_iput()?) instead of having to tag the inodes with S_SYNCIPUT. > * cramfs, gfs2, ocfs2 need to be updated for the new drop_inode semantics. > --- > > fs/block_dev.c | 2 > fs/dcache.c | 374 +++++++++++++++++++++++++++++++++++++---- > fs/drop_caches.c | 1 > fs/ext2/super.c | 2 > fs/gfs2/glock.c | 2 > fs/gfs2/ops_fstype.c | 2 > fs/hugetlbfs/inode.c | 72 +++++--- > fs/inode.c | 441 +++++++++++++++++++++++++++++++++++++++++------- > fs/ntfs/super.c | 2 > fs/smbfs/inode.c | 2 > fs/super.c | 6 - > fs/sync.c | 5 + > include/linux/dcache.h | 1 > include/linux/fs.h | 18 ++ > 14 files changed, 787 insertions(+), 143 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html