On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <jiufei.xue@xxxxxxxxxxxxxxxxx> wrote: > > Hi, > > when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task > changed to zombie occasionally while a thread is hanging with the > following stack: > > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc] > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4] > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs] > [<0>] nfs_file_release+0x35/0x50 [nfs] > [<0>] __fput+0xa2/0x1c0 > [<0>] task_work_run+0x82/0xa0 > [<0>] do_exit+0x2ac/0xc20 > [<0>] do_group_exit+0x39/0xa0 > [<0>] get_signal+0x1ce/0x5d0 > [<0>] do_signal+0x36/0x620 > [<0>] exit_to_usermode_loop+0x5e/0xc2 > [<0>] do_syscall_64+0x16c/0x190 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [<0>] 0xffffffffffffffff > > Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on > NFS4ERR_OLD_STATEID), the client will retry to close the file when > stateid generation number in client is lower than server. > > The original intention of this commit is retrying the operation while > racing with an OPEN. However, in this case the stateid generation remains > mismatch forever. > > Any suggestions? Can you include a network trace of the failure? Is it possible that the server has crashed on reply to the close and that's why the task is hung? What server are you testing against? I have seen trace where close would get ERR_OLD_STATEID and would still retry with the same open state until it got a reply to the OPEN which changed the state and when the client received reply to that, it'll retry the CLOSE with the updated stateid.