On 2019/3/4 下午11:50, Trond Myklebust wrote: > On Mon, 2019-03-04 at 23:20 +0800, Jiufei Xue wrote: >> Hi Trond, >> >> On 2019/3/1 下午9:08, Trond Myklebust wrote: >>> On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote: >>>> On 2019/3/1 下午1:08, Jiufei Xue wrote: >>>>> Hi Olga, >>>>> >>>>> On 2019/3/1 上午6:26, Olga Kornievskaia wrote: >>>>>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue < >>>>>> jiufei.xue@xxxxxxxxxxxxxxxxx> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, >>>>>>> the >>>>>>> task >>>>>>> changed to zombie occasionally while a thread is hanging >>>>>>> with >>>>>>> the >>>>>>> following stack: >>>>>>> >>>>>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc] >>>>>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4] >>>>>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs] >>>>>>> [<0>] nfs_file_release+0x35/0x50 [nfs] >>>>>>> [<0>] __fput+0xa2/0x1c0 >>>>>>> [<0>] task_work_run+0x82/0xa0 >>>>>>> [<0>] do_exit+0x2ac/0xc20 >>>>>>> [<0>] do_group_exit+0x39/0xa0 >>>>>>> [<0>] get_signal+0x1ce/0x5d0 >>>>>>> [<0>] do_signal+0x36/0x620 >>>>>>> [<0>] exit_to_usermode_loop+0x5e/0xc2 >>>>>>> [<0>] do_syscall_64+0x16c/0x190 >>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>>> [<0>] 0xffffffffffffffff >>>>>>> >>>>>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and >>>>>>> DELEGRETURN on >>>>>>> NFS4ERR_OLD_STATEID), the client will retry to close the >>>>>>> file >>>>>>> when >>>>>>> stateid generation number in client is lower than server. >>>>>>> >>>>>>> The original intention of this commit is retrying the >>>>>>> operation >>>>>>> while >>>>>>> racing with an OPEN. However, in this case the stateid >>>>>>> generation remains >>>>>>> mismatch forever. >>>>>>> >>>>>>> Any suggestions? >>>>>> >>>>>> Can you include a network trace of the failure? Is it >>>>>> possible >>>>>> that >>>>>> the server has crashed on reply to the close and that's why >>>>>> the >>>>>> task >>>>>> is hung? What server are you testing against? >>>>>> >>>>> I am testing the local server, the testcase is attached below. >>>>> >>>>> # uname -r >>>>> 5.0.0-rc8 >>>>> >>>>> # cat /etc/exports >>>>> /mnt/vdd *(rw,no_subtree_check,no_root_squash) >>>>> >>>>> # cat test.sh >>>>> mount /dev/vdd /mnt/vdd >>>>> systemctl start nfs >>>>> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd >>>>> /mnt/nfsscratch >>>>> >>>>> for ((i=0;i<10000;i++)) >>>>> do >>>>> echo "loop $i started" >>>>> /usr/local/xfstests/src/aio-dio-regress/aio-last-ref- >>>>> held- >>>>> by-io 1 100 /mnt/nfsscratch/aio-testfile >>>>> sleep 5 >>>>> done >>>>> >>>>> The command aio-last-ref-held-by-io from xfstests will spawn >>>>> 100 >>>>> threads. Each thread >>>>> repeated doing open/io_submit/close/io_getevents/io_destroy >>>>> until >>>>> the parent exist >>>>> because of timeout. >>>>> >>>>> >>>>> Here is the log when the task hung. >>>>> >>>>> [10841.121523] nfs4_close_prepare: begin! >>>>> [10841.121526] --> nfs4_alloc_slot used_slots=0000 >>>>> highest_used=4294967295 max_slots=10 >>>>> [10841.121528] <-- nfs4_alloc_slot used_slots=0001 >>>>> highest_used=0 >>>>> slotid=0 >>>>> [10841.121530] nfs4_close_prepare: done! >>>>> [10841.121536] encode_sequence: >>>>> sessionid=1551405749:3822331530:1:0 >>>>> seqid=59495845 slotid=0 max_slotid=0 cache_this=1 >>>>> [10841.121583] nfsd_dispatch: vers 4 proc 1 >>>>> [10841.121588] __find_in_sessionid_hashtbl: >>>>> 1551405749:3822331530:1:0 >>>>> [10841.121590] nfsd4_sequence: slotid 0 >>>>> [10841.121592] check_slot_seqid enter. seqid 59495845 >>>>> slot_seqid >>>>> 59495844 >>>>> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 >>>>> 61e12d9e c03beba6 0000000b) >>>>> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 >>>>> 61e12d9e c03beba6 0000000b) >>>>> [10841.121613] NFSD: nfsd4_close on file aio-testfile >>>>> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid >>>>> = >>>>> (5c7892b5/e3d4268a/000ab905/0000746b) >>>>> [10841.121619] nfsv4 compound returned 10024 >>>>> [10841.121621] --> nfsd4_store_cache_entry slot >>>>> 00000000f540ddd9 >>>>> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a) >>>>> [10841.122526] decode_attr_type: type=00 >>>>> ...... >>>>> [10841.122558] nfs4_close_done: begin! >>>>> [10841.122561] --> nfs4_alloc_slot used_slots=0001 >>>>> highest_used=0 >>>>> max_slots=10 >>>>> [10841.122564] <-- nfs4_alloc_slot used_slots=0003 >>>>> highest_used=1 >>>>> slotid=1 >>>>> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0 >>>>> [10841.122568] nfs41_sequence_process: Error 0 free the slot >>>>> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid >>>>> 4294967295 >>>>> [10841.122575] nfs4_close_done: done, ret = 0! >>>>> [10841.122584] nfs4_close_prepare: begin! >>>>> ...... >>>>> Client keep retry the CLOSE. >>>>> >>>>>> I have seen trace where close would get ERR_OLD_STATEID and >>>>>> would >>>>>> still retry with the same open state until it got a reply to >>>>>> the >>>>>> OPEN >>>>>> which changed the state and when the client received reply to >>>>>> that, >>>>>> it'll retry the CLOSE with the updated stateid. >>>>>> >>>>> >>>>> Yes, you are right. I have also noticed that. But what happened >>>>> if >>>>> the >>>>> OPEN task exit receiving a signal? How can a client update the >>>>> stateid? >>>>> >>> >>> static void nfs4_open_release(void *calldata) >>> { >>> struct >>> nfs4_opendata *data = calldata; >>> struct nfs4_state *state = NULL; >>> >>> /* If this request hasn't been cancelled, do nothing */ >>> if (!data->cancelled) >>> goto out_free; >>> /* In case of error, no cleanup! */ >>> if (data->rpc_status != 0 || !data->rpc_done) >>> goto out_free; >>> /* In case we need an open_confirm, no cleanup! */ >>> if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM) >>> goto out_free; >>> state = nfs4_opendata_to_nfs4_state(data); >>> if (!IS_ERR(state)) >>> nfs4_close_state(state, data->o_arg.fmode); >>> out_free: >>> nfs4_opendata_put(data); >>> } >>> >>> Note what happens when the 'data->cancelled' flag is set. This is >>> why >>> we always ensure that stateful operations such as OPEN, and >>> LOCK,... >>> are run as asynchronous RPC tasks. >> >> I noticed that when receiving a signal, the rpc_status will be set to >> -ERESTARTSYS in rpc_exit->rpc_exit_task->nfs4_open_done, which made >> stateid >> not updated in nfs4_open_release(). >> >> So is it possible that the stateid between the client and the server >> remain >> mismatch if the OPEN task received the signal right after >> transmitting the >> message or receiving the response, but before calling rpc_exit_task >> normally? > > If the rpc_task could be signalled, yes. However I don't see how that > can happen to an async RPC call. Have you found such a case? > I haven't found such a case, just guess :) But now I have found something new that may helpful. When the problem happened, rpc_status was set to -121 which returned from nfs4_xdr_dec_open->decode_putfh. I think it is the cause that client can not update the stateid. kworker/u4:0-7 [001] .... 559.920542: nfs4_xdr_dec_open: status=0 openstateid=40289:0xd4bdfeb7 kworker/1:0-2929 [001] .... 559.920550: nfs4_open_stateid_update: error=0 (OK) fileid=00:2b:12 fhandle=0x1e4566cb stateid=40289:0xd4bdfeb7 kworker/1:0-2929 [001] .... 559.920551: nfs4_open_release: cancel=1 rpc_status=0 rpc_done=1 openstateid=40289:0xd4bdfeb7 nfsd-2741 [001] .... 559.920571: nfsd_compound_status: op=1/6 OP_SEQUENCE status=0 nfsd-2741 [001] .... 559.920575: nfsd_compound_status: op=2/6 OP_PUTFH status=0 nfsd-2741 [001] .... 559.920581: nfsd_inc_and_copy_stateid: client 5c7df867:90134041 stateid 00031b7d:00009d62 // 40290 nfsd-2741 [001] .... 559.920584: nfsd_compound_status: op=3/6 OP_OPEN status=0 nfsd-2741 [001] .... 559.920584: nfsd_compound_status: op=4/6 OP_GETFH status=0 nfsd-2741 [001] .... 559.920585: nfsd_compound_status: op=5/6 OP_ACCESS status=0 nfsd-2741 [001] .... 559.920588: nfsd_compound_status: op=6/6 OP_GETATTR status=0 kworker/u4:0-7 [001] .... 559.920622: nfs4_xdr_dec_open: status=-121 openstateid=0:0x7bd5c66f // added for debug kworker/1:0-2929 [001] .... 559.920628: nfs4_open_release: cancel=1 rpc_status=-121 rpc_done=1 openstateid=0:0x7bd5c66f // added for debug And we got this error in dmesg: [ 559.834808] nfs: Server returned operation 53 but we issued a request for 22 [ 559.835741] nfs: Server returned operation 53 but we issued a request for 22 However, I don't kown why the client received two OP_SEQUENCE response. Any ideas? Thanks, Jiufei