Re: [PATCH v4 2/2] NFSD: add shrinker to reap courtesy clients on low memory condition

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Sat, 3 Sep 2022 01:26:12 +0000

> On Sep 2, 2022, at 3:34 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
> 
> On 9/2/22 10:58 AM, Chuck Lever III wrote:
>>>> Or, nfsd_courtesy_client_count() could return
>>>> nfsd_couresy_client_count. nfsd_courtesy_client_scan() could
>>>> then look something like this:
>>>> 
>>>> 	if ((sc->gfp_mask & GFP_KERNEL) != GFP_KERNEL)
>>>> 		return SHRINK_STOP;
>>>> 
>>>> 	nfsd_get_client_reaplist(nn, reaplist, lt);
>>>> 	list_for_each_safe(pos, next, &reaplist) {
>>>> 		clp = list_entry(pos, struct nfs4_client, cl_lru);
>>>> 		trace_nfsd_clid_purged(&clp->cl_clientid);
>>>> 		list_del_init(&clp->cl_lru);
>>>> 		expire_client(clp);
>>>> 		count++;
>>>> 	}
>>>> 	return count;
>>> This does not work, we cannot expire clients on the context of
>>> scan callback due to deadlock problem.
>> Correct, we don't want to start shrinker laundromat activity if
>> the allocation request specified that it cannot wait. But are
>> you sure it doesn't work if sc_gfp_flags is set to GFP_KERNEL?
> 
> It can be deadlock due to lock ordering problem:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.19.0-rc2_sk+ #1 Not tainted
> ------------------------------------------------------
> lck/31847 is trying to acquire lock:
> ffff88811d268850 (&sb->s_type->i_mutex_key#16){+.+.}-{3:3}, at: btrfs_inode_lock+0x38/0x70
> #012but task is already holding lock:
> ffffffffb41848c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x506/0x1db0
> #012which lock already depends on the new lock.
> #012the existing dependency chain (in reverse order) is:
> 
> #012-> #1 (fs_reclaim){+.+.}-{0:0}:
>       fs_reclaim_acquire+0xc0/0x100
>       __kmalloc+0x51/0x320
>       btrfs_buffered_write+0x2eb/0xd90
>       btrfs_do_write_iter+0x6bf/0x11c0
>       do_iter_readv_writev+0x2bb/0x5a0
>       do_iter_write+0x131/0x630
>       nfsd_vfs_write+0x4da/0x1900 [nfsd]
>       nfsd4_write+0x2ac/0x760 [nfsd]
>       nfsd4_proc_compound+0xce8/0x23e0 [nfsd]
>       nfsd_dispatch+0x4ed/0xc10 [nfsd]
>       svc_process_common+0xd3f/0x1b00 [sunrpc]
>       svc_process+0x361/0x4f0 [sunrpc]
>       nfsd+0x2d6/0x570 [nfsd]
>       kthread+0x2a1/0x340
>       ret_from_fork+0x22/0x30
> 
> #012-> #0 (&sb->s_type->i_mutex_key#16){+.+.}-{3:3}:
>       __lock_acquire+0x318d/0x7830
>       lock_acquire+0x1bb/0x500
>       down_write+0x82/0x130
>       btrfs_inode_lock+0x38/0x70
>       btrfs_sync_file+0x280/0x1010
>       nfsd_file_flush.isra.0+0x1b/0x220 [nfsd]
>       nfsd_file_put+0xd4/0x110 [nfsd]
>       release_all_access+0x13a/0x220 [nfsd]
>       nfs4_free_ol_stateid+0x40/0x90 [nfsd]
>       free_ol_stateid_reaplist+0x131/0x210 [nfsd]
>       release_openowner+0xf7/0x160 [nfsd]
>       __destroy_client+0x3cc/0x740 [nfsd]
>       nfsd_cc_lru_scan+0x271/0x410 [nfsd]
>       shrink_slab.constprop.0+0x31e/0x7d0
>       shrink_node+0x54b/0xe50
>       try_to_free_pages+0x394/0xba0
>       __alloc_pages_slowpath.constprop.0+0x5d2/0x1db0
>       __alloc_pages+0x4d6/0x580
>       __handle_mm_fault+0xc25/0x2810
>       handle_mm_fault+0x136/0x480
>       do_user_addr_fault+0x3d8/0xec0
>       exc_page_fault+0x5d/0xc0
>       asm_exc_page_fault+0x27/0x30

Is this deadlock possible only via the filecache close
paths? I can't think of a reason the laundromat has to
wait for each file to be flushed and closed -- the
laundromat should be able to "put" these files and allow
the filecache LRU to flush and close in the background.

--
Chuck Lever