On 10/05, Oleg Nesterov wrote: > > On 10/05, Dave Chinner wrote: > > > > On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote: > > > > > plus the following warnings: > > > > > > [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39 > > > [ 1895.076655] ================================= > > > [ 1895.077136] [ INFO: inconsistent lock state ] > > > [ 1895.077574] 4.8.0 #1 Not tainted > > > [ 1895.077900] --------------------------------- > > > [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > > > [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes: > > > [ 1895.079522] (&xfs_nondir_ilock_class){++++?-}, at: [<ffffffffc049ad45>] xfs_ilock+0x165/0x210 [xfs] > > > [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at: > > > > And that is a bug in the lockdep annotations for memory allocation because it > > fails to take into account the current task flags that are set via > > memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. i.e. > > in _xfs_buf_map_pages(): > > OK, I see... > > I'll re-test with the following change: > > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) > return; > > /* We're only interested __GFP_FS allocations for now */ > - if (!(gfp_mask & __GFP_FS)) > + if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS)) > return; > and with the change above "./check -b auto" finishes without lockdep warnings, probably I should send this patch to lockdep maintainers. Now, with 2/2 applied I got the following: [ INFO: inconsistent lock state ] 4.8.0+ #4 Tainted: G W --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. kswapd0/32 [HC0[0]:SC0[0]:HE1:SE1] takes: (sb_internal){+++++?}, at: [<ffffffff91292557>] __sb_start_write+0xb7/0xf0 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff9110735f>] mark_held_locks+0x6f/0xa0 [<ffffffff9110a5f3>] lockdep_trace_alloc+0xd3/0x120 [<ffffffff9126034f>] kmem_cache_alloc+0x2f/0x280 [<ffffffffc023a251>] kmem_zone_alloc+0x81/0x120 [xfs] [<ffffffffc02398bc>] xfs_trans_alloc+0x6c/0x130 [xfs] [<ffffffffc020a2c9>] xfs_sync_sb+0x39/0x80 [xfs] [<ffffffffc02332fd>] xfs_log_sbcount+0x4d/0x50 [xfs] [<ffffffffc02348d7>] xfs_quiesce_attr+0x57/0xb0 [xfs] [<ffffffffc0234951>] xfs_fs_freeze+0x21/0x40 [xfs] [<ffffffff91291e8f>] freeze_super+0xcf/0x190 [<ffffffff912a521f>] do_vfs_ioctl+0x55f/0x6c0 [<ffffffff912a53f9>] SyS_ioctl+0x79/0x90 [<ffffffff918af23c>] entry_SYSCALL_64_fastpath+0x1f/0xbd irq event stamp: 36471805 hardirqs last enabled at (36471805): [<ffffffff911f9c8d>] clear_page_dirty_for_io+0x1ed/0x2e0 hardirqs last disabled at (36471804): [<ffffffff911f9c5d>] clear_page_dirty_for_io+0x1bd/0x2e0 softirqs last enabled at (36468590): [<ffffffff918b24ea>] __do_softirq+0x37a/0x44d softirqs last disabled at (36468579): [<ffffffff910b2f15>] irq_exit+0xe5/0xf0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_internal); <Interrupt> lock(sb_internal); *** DEADLOCK *** no locks held by kswapd0/32. stack backtrace: CPU: 0 PID: 32 Comm: kswapd0 Tainted: G W 4.8.0+ #4 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 0000000000000086 00000000028a3434 ffff880139b2b520 ffffffff91449193 ffff880139b1a680 ffffffff928c1e70 ffff880139b2b570 ffffffff91106c75 0000000000000000 0000000000000001 ffff880100000001 000000000000000a Call Trace: [<ffffffff91449193>] dump_stack+0x85/0xc2 [<ffffffff91106c75>] print_usage_bug+0x215/0x240 [<ffffffff9110722b>] mark_lock+0x58b/0x650 [<ffffffff91106080>] ? print_shortest_lock_dependencies+0x1a0/0x1a0 [<ffffffff91107c4d>] __lock_acquire+0x36d/0x1870 [<ffffffff911097dd>] lock_acquire+0x10d/0x200 [<ffffffff91292557>] ? __sb_start_write+0xb7/0xf0 [<ffffffff91102ecc>] percpu_down_read+0x3c/0x90 [<ffffffff91292557>] ? __sb_start_write+0xb7/0xf0 [<ffffffff91292557>] __sb_start_write+0xb7/0xf0 [<ffffffffc0239933>] xfs_trans_alloc+0xe3/0x130 [xfs] [<ffffffffc0227dd7>] xfs_iomap_write_allocate+0x1f7/0x380 [xfs] [<ffffffffc020c333>] ? xfs_map_blocks+0xe3/0x380 [xfs] [<ffffffff911268b8>] ? rcu_read_lock_sched_held+0x58/0x60 [<ffffffffc020c47a>] xfs_map_blocks+0x22a/0x380 [xfs] [<ffffffffc020dbf8>] xfs_do_writepage+0x188/0x6c0 [xfs] [<ffffffffc020e16b>] xfs_vm_writepage+0x3b/0x70 [xfs] [<ffffffff912049b0>] pageout.isra.46+0x190/0x380 [<ffffffff91207cab>] shrink_page_list+0x9ab/0xa70 [<ffffffff91208592>] shrink_inactive_list+0x252/0x5d0 [<ffffffff9120921f>] shrink_node_memcg+0x5af/0x790 [<ffffffff912094e1>] shrink_node+0xe1/0x320 [<ffffffff9120a9d7>] kswapd+0x387/0x8b0 Probably false positive? Although when I look at the comment above xfs_sync_sb() I think that may be sometging like below makes sense, but I know absolutely nothing about fs/ and XFS in particular. Oleg. --- x/fs/xfs/xfs_trans.c +++ x/fs/xfs/xfs_trans.c @@ -245,7 +245,8 @@ xfs_trans_alloc( atomic_inc(&mp->m_active_trans); tp = kmem_zone_zalloc(xfs_trans_zone, - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP); + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT)) + ? KM_NOFS : KM_SLEEP); tp->t_magic = XFS_TRANS_HEADER_MAGIC; tp->t_flags = flags; tp->t_mountp = mp; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html