Re: [PATCH v2] dm-zoned: Avoid triggering reclaim from inside dmz_map()

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Wed, 27 Jun 2018 11:14:03 -0400 (EDT)

OK - but I think a proper fix would be to preallocate the chunks and the 
radix tree when the device is created.

If the system is highly stressed, it may be possible that the GFP_NOIO 
allocation would wait for some data being written back - and the write 
back may be directed back to the dm-zoned device, waiting for the GFP_NOIO 
allocation to succeed.

Mikulas

On Fri, 22 Jun 2018, Bart Van Assche wrote:

> This patch avoids that lockdep reports the following:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc1 #62 Not tainted
> ------------------------------------------------------
> kswapd0/84 is trying to acquire lock:
> 00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0
> 
> but task is already holding lock:
> 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (fs_reclaim){+.+.}:
>   kmem_cache_alloc+0x2c/0x2b0
>   radix_tree_node_alloc.constprop.19+0x3d/0xc0
>   __radix_tree_create+0x161/0x1c0
>   __radix_tree_insert+0x45/0x210
>   dmz_map+0x245/0x2d0 [dm_zoned]
>   __map_bio+0x40/0x260
>   __split_and_process_non_flush+0x116/0x220
>   __split_and_process_bio+0x81/0x180
>   __dm_make_request.isra.32+0x5a/0x100
>   generic_make_request+0x36e/0x690
>   submit_bio+0x6c/0x140
>   mpage_readpages+0x19e/0x1f0
>   read_pages+0x6d/0x1b0
>   __do_page_cache_readahead+0x21b/0x2d0
>   force_page_cache_readahead+0xc4/0x100
>   generic_file_read_iter+0x7c6/0xd20
>   __vfs_read+0x102/0x180
>   vfs_read+0x9b/0x140
>   ksys_read+0x55/0xc0
>   do_syscall_64+0x5a/0x1f0
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #1 (&dmz->chunk_lock){+.+.}:
>   dmz_map+0x133/0x2d0 [dm_zoned]
>   __map_bio+0x40/0x260
>   __split_and_process_non_flush+0x116/0x220
>   __split_and_process_bio+0x81/0x180
>   __dm_make_request.isra.32+0x5a/0x100
>   generic_make_request+0x36e/0x690
>   submit_bio+0x6c/0x140
>   _xfs_buf_ioapply+0x31c/0x590
>   xfs_buf_submit_wait+0x73/0x520
>   xfs_buf_read_map+0x134/0x2f0
>   xfs_trans_read_buf_map+0xc3/0x580
>   xfs_read_agf+0xa5/0x1e0
>   xfs_alloc_read_agf+0x59/0x2b0
>   xfs_alloc_pagf_init+0x27/0x60
>   xfs_bmap_longest_free_extent+0x43/0xb0
>   xfs_bmap_btalloc_nullfb+0x7f/0xf0
>   xfs_bmap_btalloc+0x428/0x7c0
>   xfs_bmapi_write+0x598/0xcc0
>   xfs_iomap_write_allocate+0x15a/0x330
>   xfs_map_blocks+0x1cf/0x3f0
>   xfs_do_writepage+0x15f/0x7b0
>   write_cache_pages+0x1ca/0x540
>   xfs_vm_writepages+0x65/0xa0
>   do_writepages+0x48/0xf0
>   __writeback_single_inode+0x58/0x730
>   writeback_sb_inodes+0x249/0x5c0
>   wb_writeback+0x11e/0x550
>   wb_workfn+0xa3/0x670
>   process_one_work+0x228/0x670
>   worker_thread+0x3c/0x390
>   kthread+0x11c/0x140
>   ret_from_fork+0x3a/0x50
> 
> -> #0 (&xfs_nondir_ilock_class){++++}:
>   down_read_nested+0x43/0x70
>   xfs_free_eofblocks+0xa2/0x1e0
>   xfs_fs_destroy_inode+0xac/0x270
>   dispose_list+0x51/0x80
>   prune_icache_sb+0x52/0x70
>   super_cache_scan+0x127/0x1a0
>   shrink_slab.part.47+0x1bd/0x590
>   shrink_node+0x3b5/0x470
>   balance_pgdat+0x158/0x3b0
>   kswapd+0x1ba/0x600
>   kthread+0x11c/0x140
>   ret_from_fork+0x3a/0x50
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim
> 
> Possible unsafe locking scenario:
> 
>      CPU0                    CPU1
>      ----                    ----
> lock(fs_reclaim);
>                              lock(&dmz->chunk_lock);
>                              lock(fs_reclaim);
> lock(&xfs_nondir_ilock_class);
> 
> *** DEADLOCK ***
> 
> 3 locks held by kswapd0/84:
>  #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
>  #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
>  #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50
> 
> stack backtrace:
> CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
> Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
> Call Trace:
>  dump_stack+0x85/0xcb
>  print_circular_bug.isra.36+0x1ce/0x1db
>  __lock_acquire+0x124e/0x1310
>  lock_acquire+0x9f/0x1f0
>  down_read_nested+0x43/0x70
>  xfs_free_eofblocks+0xa2/0x1e0
>  xfs_fs_destroy_inode+0xac/0x270
>  dispose_list+0x51/0x80
>  prune_icache_sb+0x52/0x70
>  super_cache_scan+0x127/0x1a0
>  shrink_slab.part.47+0x1bd/0x590
>  shrink_node+0x3b5/0x470
>  balance_pgdat+0x158/0x3b0
>  kswapd+0x1ba/0x600
>  kthread+0x11c/0x140
>  ret_from_fork+0x3a/0x50
> 
> Reported-by: Masato Suzuki <masato.suzuki@xxxxxxx>
> Fixes: 4218a9554653 ("dm zoned: use GFP_NOIO in I/O path")
> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxx>
> Cc: Damien Le Moal <Damien.LeMoal@xxxxxxx>
> Cc: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
> 
> Changes compared to v1: added "Cc: stable"
> 
>  drivers/md/dm-zoned-target.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 3c0e45f4dcf5..a44183ff4be0 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -787,7 +787,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  
>  	/* Chunk BIO work */
>  	mutex_init(&dmz->chunk_lock);
> -	INIT_RADIX_TREE(&dmz->chunk_rxtree, GFP_KERNEL);
> +	INIT_RADIX_TREE(&dmz->chunk_rxtree, GFP_NOIO);
>  	dmz->chunk_wq = alloc_workqueue("dmz_cwq_%s", WQ_MEM_RECLAIM | WQ_UNBOUND,
>  					0, dev->name);
>  	if (!dmz->chunk_wq) {
> -- 
> 2.17.1
>