On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote: > > patched trace-tar-dd-ext4-2.6.39-rc3+ > > > flush-8:0-3048 [004] 1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227 > > > vanilla trace-tar-dd-ext4-2.6.39-rc3 > > > flush-8:0-2911 [004] 77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938 > > > flush-8:0-2911 [000] 82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957 > > It looks too much to move 13227 and 18938 inodes at once. So I tried > arbitrarily limiting the max move number to 1000 and it helps reduce > the lock hold time and contentions a lot. Oh it seems 1000 is too small at least for this workload, it hurts dd+tar+sync total elapsed time. no limit: avg 167.486 stddev 8.996 limit=1000: avg 171.222 stddev 5.588 limit=3000: avg 165.335 stddev 5.503 So use 3000 as the new limit. Thanks, Fengguang --- Subject: writeback: limit number of moved inodes in queue_io() Date: Fri May 06 13:34:08 CST 2011 Only move 3000 inodes from b_dirty to b_io at one time. This reduces lock max hold time and lock contentions by many times in a simple dd+tar workload in a 8p test box. This workload was observed to move 10000+ inodes in one shot on ext4 which was obviously too much. class name con-bounces contentions waittime-min waittime-max waittime-total acq-b ounces acquisitions holdtime-min holdtime-max holdtime-total ---------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------- vanilla 2.6.39-rc3: inode_wb_list_lock: 2063 2065 0.12 2648.66 5948.99 27475 943778 0.09 2704.76 498340.24 ------------------ inode_wb_list_lock 89 [<ffffffff8115cf3a>] sync_inode+0x28/0x5f inode_wb_list_lock 38 [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6 inode_wb_list_lock 629 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0 inode_wb_list_lock 842 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157 ------------------ inode_wb_list_lock 891 [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249 inode_wb_list_lock 13 [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143 inode_wb_list_lock 499 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0 inode_wb_list_lock 617 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157 limit=1000: dd+tar+sync total elapsed time (10 runs): avg 171.222 stddev 5.588 &(&wb->list_lock)->rlock: 842 842 0.14 101.10 1013.34 20489 970892 0.09 234.11 509829.79 ------------------------ &(&wb->list_lock)->rlock 275 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf &(&wb->list_lock)->rlock 114 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e &(&wb->list_lock)->rlock 56 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc &(&wb->list_lock)->rlock 132 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2 ------------------------ &(&wb->list_lock)->rlock 2 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85 &(&wb->list_lock)->rlock 33 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2 &(&wb->list_lock)->rlock 9 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc &(&wb->list_lock)->rlock 430 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e limit=3000: dd+tar+sync total elapsed time (10 runs): avg 165.335 stddev 5.503 &(&wb->list_lock)->rlock: 1088 1092 0.11 245.08 3268.75 21124 1718636 0.09 384.53 849827.20 ------------------------ &(&wb->list_lock)->rlock 518 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf &(&wb->list_lock)->rlock 3 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85 &(&wb->list_lock)->rlock 54 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2 &(&wb->list_lock)->rlock 10 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc ------------------------ &(&wb->list_lock)->rlock 4 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85 &(&wb->list_lock)->rlock 379 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf &(&wb->list_lock)->rlock 4 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc &(&wb->list_lock)->rlock 446 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- fs/fs-writeback.c | 2 ++ 1 file changed, 2 insertions(+) --- linux-next.orig/fs/fs-writeback.c 2011-05-06 13:32:41.000000000 +0800 +++ linux-next/fs/fs-writeback.c 2011-05-06 16:44:58.000000000 +0800 @@ -279,6 +279,8 @@ static int move_expired_inodes(struct li sb = inode->i_sb; list_move(&inode->i_wb_list, &tmp); moved++; + if (unlikely(moved >= 3000)) /* limit spinlock hold time */ + break; } /* just one sb in list, splice to dispatch_queue and we're done */ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html