Re: [bug report] task hang while testing xfstests generic/323

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 4 Mar 2019 15:50:10 +0000

On Mon, 2019-03-04 at 23:20 +0800, Jiufei Xue wrote:
> Hi Trond,
> 
> On 2019/3/1 下午9:08, Trond Myklebust wrote:
> > On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
> > > On 2019/3/1 下午1:08, Jiufei Xue wrote:
> > > > Hi Olga,
> > > > 
> > > > On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
> > > > > On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
> > > > > jiufei.xue@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > when I tested xfstests/generic/323 with NFSv4.1 and v4.2,
> > > > > > the
> > > > > > task
> > > > > > changed to zombie occasionally while a thread is hanging
> > > > > > with
> > > > > > the
> > > > > > following stack:
> > > > > > 
> > > > > > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> > > > > > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> > > > > > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> > > > > > [<0>] nfs_file_release+0x35/0x50 [nfs]
> > > > > > [<0>] __fput+0xa2/0x1c0
> > > > > > [<0>] task_work_run+0x82/0xa0
> > > > > > [<0>] do_exit+0x2ac/0xc20
> > > > > > [<0>] do_group_exit+0x39/0xa0
> > > > > > [<0>] get_signal+0x1ce/0x5d0
> > > > > > [<0>] do_signal+0x36/0x620
> > > > > > [<0>] exit_to_usermode_loop+0x5e/0xc2
> > > > > > [<0>] do_syscall_64+0x16c/0x190
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > [<0>] 0xffffffffffffffff
> > > > > > 
> > > > > > Since commit 12f275cdd163(NFSv4: Retry CLOSE and
> > > > > > DELEGRETURN on
> > > > > > NFS4ERR_OLD_STATEID), the client will retry to close the
> > > > > > file
> > > > > > when
> > > > > > stateid generation number in client is lower than server.
> > > > > > 
> > > > > > The original intention of this commit is retrying the
> > > > > > operation
> > > > > > while
> > > > > > racing with an OPEN. However, in this case the stateid
> > > > > > generation remains
> > > > > > mismatch forever.
> > > > > > 
> > > > > > Any suggestions?
> > > > > 
> > > > > Can you include a network trace of the failure? Is it
> > > > > possible
> > > > > that
> > > > > the server has crashed on reply to the close and that's why
> > > > > the
> > > > > task
> > > > > is hung? What server are you testing against?
> > > > > 
> > > > I am testing the local server, the testcase is attached below.
> > > > 
> > > > # uname -r
> > > > 5.0.0-rc8
> > > > 
> > > > # cat /etc/exports
> > > > /mnt/vdd *(rw,no_subtree_check,no_root_squash)
> > > > 
> > > > # cat test.sh
> > > > mount /dev/vdd /mnt/vdd
> > > > systemctl start nfs
> > > > mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
> > > > /mnt/nfsscratch
> > > > 
> > > > for ((i=0;i<10000;i++))
> > > > do
> > > >         echo "loop $i started"
> > > >         /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-
> > > > held-
> > > > by-io 1 100 /mnt/nfsscratch/aio-testfile
> > > > 	sleep 5
> > > > done
> > > > 
> > > > The command aio-last-ref-held-by-io from xfstests will spawn
> > > > 100
> > > > threads. Each thread
> > > > repeated doing open/io_submit/close/io_getevents/io_destroy
> > > > until
> > > > the parent exist
> > > > because of timeout.
> > > > 
> > > > 
> > > > Here is the log when the task hung.
> > > > 
> > > > [10841.121523] nfs4_close_prepare: begin!
> > > > [10841.121526] --> nfs4_alloc_slot used_slots=0000
> > > > highest_used=4294967295 max_slots=10
> > > > [10841.121528] <-- nfs4_alloc_slot used_slots=0001
> > > > highest_used=0
> > > > slotid=0
> > > > [10841.121530] nfs4_close_prepare: done!
> > > > [10841.121536] encode_sequence:
> > > > sessionid=1551405749:3822331530:1:0
> > > > seqid=59495845 slotid=0 max_slotid=0 cache_this=1
> > > > [10841.121583] nfsd_dispatch: vers 4 proc 1
> > > > [10841.121588] __find_in_sessionid_hashtbl:
> > > > 1551405749:3822331530:1:0
> > > > [10841.121590] nfsd4_sequence: slotid 0
> > > > [10841.121592] check_slot_seqid enter. seqid 59495845
> > > > slot_seqid
> > > > 59495844
> > > > [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > > > 61e12d9e c03beba6 0000000b)
> > > > [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > > > 61e12d9e c03beba6 0000000b)
> > > > [10841.121613] NFSD: nfsd4_close on file aio-testfile
> > > > [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid
> > > > =
> > > > (5c7892b5/e3d4268a/000ab905/0000746b)
> > > > [10841.121619] nfsv4 compound returned 10024
> > > > [10841.121621] --> nfsd4_store_cache_entry slot
> > > > 00000000f540ddd9
> > > > [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
> > > > [10841.122526] decode_attr_type: type=00
> > > > ......
> > > > [10841.122558] nfs4_close_done: begin!
> > > > [10841.122561] --> nfs4_alloc_slot used_slots=0001
> > > > highest_used=0
> > > > max_slots=10
> > > > [10841.122564] <-- nfs4_alloc_slot used_slots=0003
> > > > highest_used=1
> > > > slotid=1
> > > > [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
> > > > [10841.122568] nfs41_sequence_process: Error 0 free the slot
> > > > [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
> > > > 4294967295
> > > > [10841.122575] nfs4_close_done: done, ret = 0!
> > > > [10841.122584] nfs4_close_prepare: begin!
> > > > ......
> > > > Client keep retry the CLOSE.
> > > > 
> > > > > I have seen trace where close would get ERR_OLD_STATEID and
> > > > > would
> > > > > still retry with the same open state until it got a reply to
> > > > > the
> > > > > OPEN
> > > > > which changed the state and when the client received reply to
> > > > > that,
> > > > > it'll retry the CLOSE with the updated stateid.
> > > > > 
> > > > 
> > > > Yes, you are right. I have also noticed that. But what happened
> > > > if
> > > > the
> > > > OPEN task exit receiving a signal? How can a client update the
> > > > stateid?
> > > > 
> > 
> > static void nfs4_open_release(void *calldata)
> > {
> >         struct
> > nfs4_opendata *data = calldata;
> >         struct nfs4_state *state = NULL;
> > 
> >         /* If this request hasn't been cancelled, do nothing */
> >         if (!data->cancelled)
> >                 goto out_free;
> >         /* In case of error, no cleanup! */
> >         if (data->rpc_status != 0 || !data->rpc_done)
> >                 goto out_free;
> >         /* In case we need an open_confirm, no cleanup! */
> >         if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
> >                 goto out_free;
> >         state = nfs4_opendata_to_nfs4_state(data);
> >         if (!IS_ERR(state))
> >                 nfs4_close_state(state, data->o_arg.fmode);
> > out_free:
> >         nfs4_opendata_put(data);
> > }
> > 
> > Note what happens when the 'data->cancelled' flag is set. This is
> > why
> > we always ensure that stateful operations such as OPEN, and
> > LOCK,...
> > are run as asynchronous RPC tasks.
> 
> I noticed that when receiving a signal, the rpc_status will be set to
> -ERESTARTSYS in rpc_exit->rpc_exit_task->nfs4_open_done, which made
> stateid
> not updated in nfs4_open_release().
>
> So is it possible that the stateid between the client and the server
> remain
> mismatch if the OPEN task received the signal right after
> transmitting the
> message or receiving the response, but before calling rpc_exit_task
> normally?

If the rpc_task could be signalled, yes. However I don't see how that
can happen to an async RPC call. Have you found such a case?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx