On 07/27/2015 12:52 PM, Mel Gorman wrote: > On Wed, Jul 22, 2015 at 03:46:16PM +0200, Jerome Marchand wrote: >> On 07/22/2015 02:23 PM, Trond Myklebust wrote: >>> On Wed, Jul 22, 2015 at 4:10 AM, Jerome Marchand <jmarchan@xxxxxxxxxx> wrote: >>>> >>>> Lockdep warns about a inconsistent {RECLAIM_FS-ON-W} -> >>>> {IN-RECLAIM_FS-W} usage. The culpritt is the inode->i_mutex taken in >>>> nfs_file_direct_write(). This code was introduced by commit a9ab5e840669 >>>> ("nfs: page cache invalidation for dio"). >>>> This naive test patch avoid to take the mutex on a swapfile and makes >>>> lockdep happy again. However I don't know much about NFS code and I >>>> assume it's probably not the proper solution. Any thought? >>>> >>>> Signed-off-by: Jerome Marchand <jmarchan@xxxxxxxxxx> >>> >>> NFS is not the only O_DIRECT implementation to set the inode->i_mutex. >>> Why can't this be fixed in the generic swap code instead of adding >>> yet-another-exception-for-IS_SWAPFILE? >> >> I meant to cc Mel. Just added him. >> > > Can the full lockdep warning be included as it'll be easier to see then if > the generic swap code can somehow special case this? Currently, generic > swapping does not not need to care about how the filesystem locked. > For most filesystems, it's writing directly to the blocks on disk and > bypassing the FS. In the NFS case it'd be surprising to find that there > also are dirty pages in page cache that belong to the swap file as it's > going to cause corruption. If there is any special casing it would to only > attempt the invalidation in the !swap case and warn if mapping->nrpages. It > still would look a bit weird but safer than just not acquiring the mutex > and then potentially attempting an invalidation. > [ 6819.501009] ================================= [ 6819.501009] [ INFO: inconsistent lock state ] [ 6819.501009] 4.2.0-rc1-shmacct-babka-v2-next-20150709+ #255 Not tainted [ 6819.501009] --------------------------------- [ 6819.501009] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 6819.501009] kswapd0/38 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 6819.501009] (&sb->s_type->i_mutex_key#17){+.+.?.}, at: [<ffffffffa03772a5>] nfs_file_direct_write+0x85/0x3f0 [nfs] [ 6819.501009] {RECLAIM_FS-ON-W} state was registered at: [ 6819.501009] [<ffffffff81107f51>] mark_held_locks+0x71/0x90 [ 6819.501009] [<ffffffff8110b775>] lockdep_trace_alloc+0x75/0xe0 [ 6819.501009] [<ffffffff81245529>] kmem_cache_alloc_node_trace+0x39/0x440 [ 6819.501009] [<ffffffff81225b8f>] __get_vm_area_node+0x7f/0x160 [ 6819.501009] [<ffffffff81226eb2>] __vmalloc_node_range+0x72/0x2c0 [ 6819.501009] [<ffffffff81227424>] vzalloc+0x54/0x60 [ 6819.501009] [<ffffffff8122c7c8>] SyS_swapon+0x628/0xfc0 [ 6819.501009] [<ffffffff81867772>] entry_SYSCALL_64_fastpath+0x12/0x76 [ 6819.501009] irq event stamp: 163459 [ 6819.501009] hardirqs last enabled at (163459): [<ffffffff81866c66>] _raw_spin_unlock_irqrestore+0x36/0x60 [ 6819.501009] hardirqs last disabled at (163458): [<ffffffff8186747b>] _raw_spin_lock_irqsave+0x2b/0x90 [ 6819.501009] softirqs last enabled at (162966): [<ffffffff810b13d3>] __do_softirq+0x363/0x630 [ 6819.501009] softirqs last disabled at (162961): [<ffffffff810b1a03>] irq_exit+0xf3/0x100 [ 6819.501009] other info that might help us debug this: [ 6819.501009] Possible unsafe locking scenario: [ 6819.501009] CPU0 [ 6819.501009] ---- [ 6819.501009] lock(&sb->s_type->i_mutex_key#17); [ 6819.501009] <Interrupt> [ 6819.501009] lock(&sb->s_type->i_mutex_key#17); [ 6819.501009] *** DEADLOCK *** [ 6819.501009] no locks held by kswapd0/38. [ 6819.501009] stack backtrace: [ 6819.501009] CPU: 1 PID: 38 Comm: kswapd0 Not tainted 4.2.0-rc1-shmacct-babka-v2-next-20150709+ #255 [ 6819.501009] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 6819.501009] 0000000000000000 00000000cca71737 ffff880033f374d8 ffffffff8185ce5b [ 6819.501009] 0000000000000000 ffff880033f30000 ffff880033f37538 ffffffff8185732d [ 6819.501009] 0000000000000000 ffff880000000001 ffff880000000001 ffffffff8102f49f [ 6819.501009] Call Trace: [ 6819.501009] [<ffffffff8185ce5b>] dump_stack+0x4c/0x65 [ 6819.501009] [<ffffffff8185732d>] print_usage_bug+0x1f2/0x203 [ 6819.501009] [<ffffffff8102f49f>] ? save_stack_trace+0x2f/0x50 [ 6819.501009] [<ffffffff81107430>] ? check_usage_backwards+0x150/0x150 [ 6819.501009] [<ffffffff81107e52>] mark_lock+0x212/0x2a0 [ 6819.501009] [<ffffffff81108d73>] __lock_acquire+0x8d3/0x1f40 [ 6819.501009] [<ffffffff8110953e>] ? __lock_acquire+0x109e/0x1f40 [ 6819.501009] [<ffffffff8110ac92>] lock_acquire+0xc2/0x280 [ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs] [ 6819.501009] [<ffffffff818641bf>] mutex_lock_nested+0x7f/0x3f0 [ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs] [ 6819.501009] [<ffffffff81105328>] ? __lock_is_held+0x58/0x80 [ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs] [ 6819.501009] [<ffffffff8122a500>] ? get_swap_bio+0x90/0x90 [ 6819.501009] [<ffffffffa03772a5>] nfs_file_direct_write+0x85/0x3f0 [nfs] [ 6819.501009] [<ffffffff8122a500>] ? get_swap_bio+0x90/0x90 [ 6819.501009] [<ffffffffa0377640>] nfs_direct_IO+0x30/0x50 [nfs] [ 6819.501009] [<ffffffff8122a9b5>] __swap_writepage+0x105/0x270 [ 6819.501009] [<ffffffff8122ab59>] swap_writepage+0x39/0x70 [ 6819.501009] [<ffffffff811fbef2>] shmem_writepage+0x1f2/0x330 [ 6819.501009] [<ffffffff811f3319>] pageout.isra.48+0x189/0x4a0 [ 6819.501009] [<ffffffff811f5497>] shrink_page_list+0x9b7/0xc80 [ 6819.501009] [<ffffffff811f60a8>] shrink_inactive_list+0x3a8/0x800 [ 6819.501009] [<ffffffff810e72f5>] ? local_clock+0x15/0x30 [ 6819.501009] [<ffffffff811f6f10>] shrink_lruvec+0x610/0x800 [ 6819.501009] [<ffffffff811f71e7>] shrink_zone+0xe7/0x2d0 [ 6819.501009] [<ffffffff811f8ddd>] kswapd+0x55d/0xd30 [ 6819.501009] [<ffffffff811f8880>] ? mem_cgroup_shrink_node_zone+0x490/0x490 [ 6819.501009] [<ffffffff810d1a74>] kthread+0x104/0x120 [ 6819.501009] [<ffffffff810d1970>] ? kthread_create_on_node+0x250/0x250 [ 6819.501009] [<ffffffff81867aef>] ret_from_fork+0x3f/0x70 [ 6819.501009] [<ffffffff810d1970>] ? kthread_create_on_node+0x250/0x250
Attachment:
signature.asc
Description: OpenPGP digital signature