Re: lockdep splat on 4.18.0

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 7 Sep 2018 09:40:45 +1000

On Thu, Sep 06, 2018 at 11:44:07AM -0400, Jeff Mahoney wrote:
> Hi folks -
> 
> I hit this lockdep splat on 4.18.0 this morning (the + in the version is
> due to btrfs patch; xfs is unmodified).  In my experience lockdep splats
> involving mount are false positive, but Eric suggested I drop it here
> just the same.

Thanks Jeff!

tl;dr looks like a false positive we might be able to shut up by
changing the order of code in xfs_trans_alloc().

> 
> -Jeff
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-vanilla+ #8 Not tainted
> ------------------------------------------------------
> kswapd0/56 is trying to acquire lock:
> 000000002f3c47dc (sb_internal){.+.+}, at: xfs_trans_alloc+0x19d/0x250 [xfs]
> 
> but task is already holding lock:
> 00000000e0553233 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x40
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (fs_reclaim){+.+.}:
>        lock_acquire+0xbd/0x220
>        __fs_reclaim_acquire+0x2c/0x40
>        kmem_cache_alloc+0x2b/0x320
>        kmem_zone_alloc+0x95/0x100 [xfs]
>        xfs_trans_alloc+0x6f/0x250 [xfs]
>        xlog_recover_process_intents+0x1f6/0x300 [xfs]
>        xlog_recover_finish+0x18/0xa0 [xfs]
>        xfs_log_mount_finish+0x6d/0x110 [xfs]
>        xfs_mountfs+0x6f0/0xa40 [xfs]
>        xfs_fs_fill_super+0x520/0x6e0 [xfs]
>        mount_bdev+0x187/0x1c0
>        mount_fs+0x3a/0x160
>        vfs_kern_mount+0x66/0x150
>        do_mount+0x1d9/0xcf0
>        ksys_mount+0x7e/0xd0
>        __x64_sys_mount+0x21/0x30
>        do_syscall_64+0x5d/0x1a0
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #0 (sb_internal){.+.+}:
>        __lock_acquire+0x436/0x770
>        lock_acquire+0xbd/0x220
>        __sb_start_write+0x166/0x1d0
>        xfs_trans_alloc+0x19d/0x250 [xfs]
>        xfs_iomap_write_allocate+0x1d7/0x330 [xfs]
>        xfs_map_blocks+0x2d7/0x550 [xfs]
>        xfs_do_writepage+0x26b/0x7a0 [xfs]
>        xfs_vm_writepage+0x28/0x50 [xfs]
>        pageout.isra.51+0x1ca/0x450
>        shrink_page_list+0x811/0xe30
>        shrink_inactive_list+0x2e2/0x770
>        shrink_node_memcg+0x32d/0x750
>        shrink_node+0xc9/0x470
>        balance_pgdat+0x175/0x360
>        kswapd+0x181/0x5d0
>        kthread+0xf8/0x130
>        ret_from_fork+0x3a/0x50
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(fs_reclaim);
>                                lock(sb_internal);
>                                lock(fs_reclaim);
>   lock(sb_internal);

Ok, looks like kswapd is doing direct writeback from reclaim (why
hasn't that been killed already?), which takes a freeze reference
before we start the transaction. Then, elsewhere, we do the normal
thing of taking a freeze reference, then allocating the transaction
structure via GFP_KERNEL, triggering then "memory reclaim lock
inversion".

It's not a deadlock - for anything to deadlock in this path, we have
to be in the middle of a freeze and have frozen the transaction
subsystem. Which we cannot do until we've cleaned all the dirty
cached pages in the filesystem and frozen all new writes. Which means
kswapd cannot enter this direct writeback path because we can't have
dirty pages on the filesystem.

So, yeah, yet another false positive.

I suspect we can shut it up by changing the order of operations in
xfs_trans_alloc(). I'll have a look at that.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx