On Wed, Apr 05, 2017 at 04:52:11PM -0400, Vivek Goyal wrote: > Hi, > > I am running 4.11.0-rc5 kernel and did a kernel build and noticed > following lockdep warning on console. Have not analyzed it. Lots of > xfs in backtrace, so sending it to xfs mailing list. > Darrick pointed out on irc yesterday that this is likely due to the lock_inode() call in chmod_common(). I was confused as to where the iolock came into play here, but apparently we now reuse the core inode->i_rwsem for that. In any event, I was playing around with this and reproduce pretty easily by populating an fs with a bunch files with speculative preallocation and then generating some memory pressure. I reproduce with the following stack, however: [ 434.220605] ================================= [ 434.222286] [ INFO: inconsistent lock state ] [ 434.224092] 4.11.0-rc4+ #36 Tainted: G OE [ 434.225839] --------------------------------- [ 434.227587] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 434.229995] kswapd0/59 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 434.231851] (&sb->s_type->i_mutex_key#17){+.+.?.}, at: [<ffffffffc078e0aa>] xfs_ilock+0x20a/0x300 [xfs] [ 434.235473] {RECLAIM_FS-ON-W} state was registered at: [ 434.237427] mark_held_locks+0x76/0xa0 [ 434.238840] lockdep_trace_alloc+0x7d/0xe0 [ 434.240362] kmem_cache_alloc+0x2f/0x2d0 [ 434.241871] kmem_zone_alloc+0x81/0x120 [xfs] [ 434.243559] xfs_trans_alloc+0x6c/0x130 [xfs] [ 434.245233] xfs_vn_update_time+0x75/0x230 [xfs] [ 434.247031] file_update_time+0xbc/0x110 [ 434.248593] xfs_file_aio_write_checks+0x19b/0x1c0 [xfs] [ 434.250762] xfs_file_buffered_aio_write+0x75/0x350 [xfs] [ 434.252978] xfs_file_write_iter+0x103/0x150 [xfs] [ 434.254935] __vfs_write+0xe8/0x160 [ 434.256325] vfs_write+0xcb/0x1f0 [ 434.257625] SyS_pwrite64+0x98/0xc0 [ 434.258963] entry_SYSCALL_64_fastpath+0x1f/0xc2 ... so this isn't just a chmod thing. OTOH, I think we agree that this is not a real deadlock vector because the iolock is taken in the destroy_inode() path and so there should be no other reference to the inode. That aside, the IOLOCK_EXCL was added to xfs_inactive() in commit a36b926180 ("xfs: pull up iolock from xfs_free_eofblocks()") purely to honor the cleaner call semantics that patch defined for xfs_free_eofblocks(). We could probably either drop the iolock from here (though we would then have to kill the assert in xfs_free_eofblocks()), or use something like the diff below that quiets the lockdep splat for me. Thoughts? Brian diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7605d83..eb80d31 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1908,7 +1908,11 @@ xfs_inactive( * broken free space accounting. */ if (xfs_can_free_eofblocks(ip, true)) { - xfs_ilock(ip, XFS_IOLOCK_EXCL); + /* trylock to quiet lockdep, iolock should be free */ + if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { + ASSERT(0); + xfs_ilock(ip, XFS_IOLOCK_EXCL); + } xfs_free_eofblocks(ip); xfs_iunlock(ip, XFS_IOLOCK_EXCL); } > Thanks > Vivek > > login: [ 4931.174758] > [ 4931.175065] ================================= > [ 4931.175731] [ INFO: inconsistent lock state ] > [ 4931.176365] 4.11.0-rc5+ #87 Not tainted > [ 4931.176920] --------------------------------- > [ 4931.177537] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. > [ 4931.178463] kswapd0/128 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 4931.179198] (&sb->s_type->i_mutex_key#12){++++?+}, at: [<ffffffffa01fcb0a>] xfs_ilock+0x13a/0x210 [xfs] > [ 4931.180584] {RECLAIM_FS-ON-W} state was registered at: > [ 4931.181320] mark_held_locks+0x6f/0xa0 > [ 4931.181878] lockdep_trace_alloc+0x7d/0xe0 > [ 4931.182474] kmem_cache_alloc+0x2f/0x2a0 > [ 4931.183083] kmem_zone_alloc+0x81/0x120 [xfs] > [ 4931.183739] xfs_trans_alloc+0x6c/0x130 [xfs] > [ 4931.184407] xfs_setattr_nonsize+0x239/0x560 [xfs] > [ 4931.185135] xfs_vn_setattr_nonsize+0x59/0x150 [xfs] > [ 4931.185890] xfs_vn_setattr+0x22/0x70 [xfs] > [ 4931.186503] notify_change+0x2ee/0x440 > [ 4931.187058] chmod_common+0xc5/0x150 > [ 4931.187582] SyS_fchmod+0x53/0x90 > [ 4931.188077] do_syscall_64+0x6c/0x1f0 > [ 4931.188616] return_from_SYSCALL_64+0x0/0x7a > [ 4931.189238] irq event stamp: 397343 > [ 4931.189739] hardirqs last enabled at (397343): [<ffffffff81135cca>] __call_rcu+0x1fa/0x340 > [ 4931.190909] hardirqs last disabled at (397342): [<ffffffff81135b21>] __call_rcu+0x51/0x340 > [ 4931.192070] softirqs last enabled at (397192): [<ffffffff818fb86d>] __do_softirq+0x38d/0x4c3 > [ 4931.193263] softirqs last disabled at (397185): [<ffffffff810bae27>] irq_exit+0xf7/0x100 > [ 4931.194397] > [ 4931.194397] other info that might help us debug this: > [ 4931.195318] Possible unsafe locking scenario: > [ 4931.195318] > [ 4931.196155] CPU0 > [ 4931.196511] ---- > [ 4931.196874] lock(&sb->s_type->i_mutex_key#12); > [ 4931.197526] <Interrupt> > [ 4931.197912] lock(&sb->s_type->i_mutex_key#12); > [ 4931.198591] > [ 4931.198591] *** DEADLOCK *** > [ 4931.198591] > [ 4931.199429] 2 locks held by kswapd0/128: > [ 4931.199990] #0: (shrinker_rwsem){++++..}, at: [<ffffffff8121a16e>] shrink_slab.part.46+0x5e/0x600 > [ 4931.201261] #1: (&type->s_umount_key#48){++++++}, at: [<ffffffff812aa54b>] trylock_super+0x1b/0x50 > [ 4931.202832] > [ 4931.202832] stack backtrace: > [ 4931.203878] CPU: 2 PID: 128 Comm: kswapd0 Not tainted 4.11.0-rc5+ #87 > [ 4931.204998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 > [ 4931.206533] Call Trace: > [ 4931.207114] dump_stack+0x86/0xc3 > [ 4931.207807] print_usage_bug+0x1d0/0x1e0 > [ 4931.208573] mark_lock+0x559/0x5c0 > [ 4931.209274] ? print_shortest_lock_dependencies+0x1a0/0x1a0 > [ 4931.210274] __lock_acquire+0x6ce/0x13c0 > [ 4931.211049] lock_acquire+0xe3/0x1d0 > [ 4931.211776] ? lock_acquire+0xe3/0x1d0 > [ 4931.212556] ? xfs_ilock+0x13a/0x210 [xfs] > [ 4931.213373] ? xfs_inactive+0xec/0x130 [xfs] > [ 4931.214231] down_write_nested+0x46/0x80 > [ 4931.215038] ? xfs_ilock+0x13a/0x210 [xfs] > [ 4931.215851] xfs_ilock+0x13a/0x210 [xfs] > [ 4931.216634] xfs_inactive+0xec/0x130 [xfs] > [ 4931.217699] xfs_fs_destroy_inode+0xbb/0x2d0 [xfs] > [ 4931.218594] destroy_inode+0x3b/0x60 > [ 4931.219314] evict+0x139/0x1c0 > [ 4931.220061] dispose_list+0x56/0x80 > [ 4931.220765] prune_icache_sb+0x5a/0x80 > [ 4931.221498] super_cache_scan+0x14e/0x1a0 > [ 4931.222269] shrink_slab.part.46+0x216/0x600 > [ 4931.223075] shrink_slab+0x29/0x30 > [ 4931.223883] shrink_node+0x108/0x320 > [ 4931.224588] kswapd+0x391/0x990 > [ 4931.225246] kthread+0x10c/0x140 > [ 4931.225902] ? mem_cgroup_shrink_node+0x300/0x300 > [ 4931.226760] ? kthread_create_on_node+0x70/0x70 > [ 4931.227579] ret_from_fork+0x31/0x40 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html