On Sat, Dec 18, 2010 at 04:40:23AM +1100, Nick Piggin wrote: > With the iprune_sem and iolock lock order warnings taken care of, > lockdep soon after chokes on i_lock What kernel are you running? It does not appear to be vanilla XFS, as: > [ 716.364005] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. > [ 716.364005] cp/8370 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 716.364005] (&(&ip->i_lock)->mr_lock){++++-?}, at: > [<ffffffffa005537c>] xfs_ilock+0x8c/0x150 [xfs] > [ 716.364005] {RECLAIM_FS-ON-R} state was registered at: > [ 716.364005] [<ffffffff8108392b>] mark_held_locks+0x6b/0xa0 > [ 716.364005] [<ffffffff810839f1>] lockdep_trace_alloc+0x91/0xd0 > [ 716.364005] [<ffffffff811104fa>] __kmalloc+0x5a/0x220 > [ 716.364005] [<ffffffffa0078717>] kmem_alloc+0x87/0xd0 [xfs] > [ 716.364005] [<ffffffffa002b8fb>] xfs_attr_shortform_list+0xfb/0x480 [xfs] > [ 716.364005] [<ffffffffa0027ab8>] xfs_attr_list_int+0xd8/0xe0 [xfs] > [ 716.364005] [<ffffffffa0088b5f>] xfs_vn_listxattr+0x7f/0x160 [xfs] > [ 716.364005] [<ffffffff811396ef>] vfs_listxattr+0x1f/0x30 > [ 716.364005] [<ffffffff81139b1f>] listxattr+0x3f/0xf0 > [ 716.364005] [<ffffffff81139c14>] sys_flistxattr+0x44/0x70 > [ 716.364005] [<ffffffff810030bb>] system_call_fastpath+0x16/0x1b[ > 716.364005] irq event stamp: 322521151 > [ 716.364005] hardirqs last enabled at (322521151): > [<ffffffff81601cbd>] mutex_trylock+0x11d/0x190 > [ 716.364005] hardirqs last disabled at (322521150): > [<ffffffff81601bde>] mutex_trylock+0x3e/0x190 > [ 716.364005] softirqs last enabled at (322518910): > [<ffffffff81050d0e>] __do_softirq+0x16e/0x360 > [ 716.364005] softirqs last disabled at (322518881): > [<ffffffff81003f8c>] call_softirq+0x1c/0x50 > [ 716.364005] > [ 716.364005] other info that might help us debug this: > [ 716.364005] 3 locks held by cp/8370: > [ 716.364005] #0: (xfs_iolock_active){++++++}, at: ^^^^^^^^^^^^^^^^^ This patch is not yet mainline. If you really want to do significant XFS scalability testing for .38, you should probably pull these branches in for testing: git://git.kernel.org/pub/scm/linux/dgc/xfsdev.git inode-scale git://git.kernel.org/pub/scm/linux/dgc/xfsdev.git xfs-for-2.6.38 > [<ffffffffa0055395>] xfs_ilock+0xa5/0x150 [xfs] > [ 716.364005] #1: (shrinker_rwsem){++++..}, at: > [<ffffffff810d91c8>] shrink_slab+0x38/0x190 > [ 716.364005] #2: (&pag->pag_ici_reclaim_lock){+.+...}, at: > [<ffffffffa00875c4>] xfs_reclaim_inodes_ag+ > 0xa4/0x360 [xfs] > [ 716.364005] > [ 716.364005] stack backtrace: > [ 716.364005] Pid: 8370, comm: cp Not tainted 2.6.37-rc6+ #116 > [ 716.364005] Call Trace: > [ 716.364005] [<ffffffff81082a10>] print_usage_bug+0x170/0x180 > [ 716.364005] [<ffffffff810836d1>] mark_lock+0x211/0x400 > [ 716.364005] [<ffffffff810842ce>] __lock_acquire+0x40e/0x1490 > [ 716.364005] [<ffffffff810853e5>] lock_acquire+0x95/0x1b0 > [ 716.364005] [<ffffffffa005537c>] ? xfs_ilock+0x8c/0x150 [xfs] > [ 716.364005] [<ffffffff8127c35c>] ? rcu_read_lock_held+0x2c/0x30 > [ 716.364005] [<ffffffff81073d5a>] down_write_nested+0x4a/0x70 > [ 716.364005] [<ffffffffa005537c>] ? xfs_ilock+0x8c/0x150 [xfs] > [ 716.364005] [<ffffffffa005537c>] xfs_ilock+0x8c/0x150 [xfs] > [ 716.364005] [<ffffffffa00872e6>] xfs_reclaim_inode+0x36/0x270 [xfs] > [ 716.364005] [<ffffffffa008772f>] xfs_reclaim_inodes_ag+0x20f/0x360 [xfs] > [ 716.364005] [<ffffffffa00878f8>] xfs_reclaim_inode_shrink+0x78/0x80 [xfs] > [ 716.364005] [<ffffffff810d92b7>] shrink_slab+0x127/0x190 > [ 716.364005] [<ffffffff810dc189>] zone_reclaim+0x349/0x420 > > I assume this should be a false positive too, for the same reason, > and could be handled the same way as iolock. The ilock is very different to the iolock in terms of usage - the ilock is required in the writeback path (for block mapping, allocation and file size updates) while the iolock is not. Hence this is indicative of a potential deadlock and we shouldn't be doing memory allocation with the ilock outside a transaction. Allocations inside transactions are transformed to GFP_NOFS so are safe against such lock recursion, but outside transactions we need to use KM_NOFS directly. I'll send out a patch on Monday after I've looked at the code in more detail.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs