nfsv4.1 deadlock between evict and nfs_fhget when drain session

"zhangxiaoxu (A)" <zhangxiaoxu5@xxxxxxxxxx> · Tue, 1 Jun 2021 16:35:45 +0800

Hello,

We're seeing a deadlock on NFSv4.1.

The process of the deadlock maybe as below:
 - task 1: prune icache, and mark inode_A & inode_B on freeing, then evict inode_A first, but waiting for inode_A's delegation return to server
 - task 2: open file, already got the fh from server, waiting for the inode_B which has the same file handle was freed complete
 - task 3: state manager is on draining session, but there is a slot is hold by task2
 - task 4: run the delegreturn rpc_task, but the session is on draining, so the delegreturn is sleeping on rpc. Then task 1 blocked.
then deadlocked.

Commit 244fcd2f9a90 ("NFS: Ensure we time out if a delegreturn does not complete") already ensure the delegreturn
task can timeout if get slot from session. But can't timeout if task sleep on rpc when session is on draining.

I think commit 5fcdfacc01f3 ("NFSv4: Return delegations synchronously in evict_inode") introduce this problem.
But if revert it, there maybe another deadlock because task 1 maybe waiting inode_A writeback complete.
If make delegreturn privileged in rpc, as the same above.

I think the task 2 should free the slot as soon as possible when it's rpc task complete.
But ae55e59da0e4 ("pnfs: Don't release the sequence slot until we've processed layoutget on open") made slot freed more late.

Any idea about this problem is welcome.

Stacks of the problem:

# task1:
__wait_on_freeing_inode
find_inode
ilookup5_nowait
ilookup5
iget5_locked
nfs_fhget
_nfs4_opendata_to_nfs4_state
nfs4_do_open
nfs4_atomic_open
nfs_atomic_open
path_openat
do_filp_open
do_sys_open
__x64_sys_open
do_syscall_64
entry_SYSCALL_64_after_hwframe

# task2:
rpc_wait_bit_killable
__rpc_wait_for_completion_task
_nfs4_proc_delegreturn
nfs4_proc_delegreturn
nfs_do_return_delegation
nfs_inode_return_delegation_noreclaim
nfs4_evict_inode
evict
dispose_list
prune_icache_sb
super_cache_scan
do_shrink_slab
shrink_slab
shrink_node
kswapd
kthread
ret_from_fork

# task3:
nfs4_drain_slot_tbl
nfs4_begin_drain_session
nfs4_run_state_manager
kthread
ret_from_fork