Re: [bug report] task hang while testing xfstests generic/323

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <jiufei.xue@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Olga,
>
> On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> > Let me double check that. I have reproduced the "infinite loop" or
> > CLOSE on the upstream (I'm looking thru the trace points from friday).
>
> Do you try to capture the packages when reproduced this issue on the
> upstream. I still lost kernel packages after some adjustment according
> to bfield's suggestion :(

Hi Jiufei,

Yes I have network trace captures but they are too big to post to the
mailing list. I have reproduced the problem on the latest upstream
origin/testing branch commit "SUNRPC: Take the transport send lock
before binding+connecting". As you have noted before infinite loops is
due to client "losing" an update to the seqid.

one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The rpc
task would be sent and the rpc task would receive a reply but there is
nobody there to receive it... This open that got a reply has an
updated stateid seqid which client never updates. When CLOSE is sent,
it's sent with the "old" stateid and puts the client in an infinite
loop. Btw, CLOSE is sent on the interrupted slot which should get
FALSE_RETRY which causes the client to terminate the session. But it
would still keep sending the CLOSE with the old stateid.

Some things I've noticed is that TEST_STATE op (as a part of the
nfs41_test_and _free_expired_stateid()) for some reason always has a
signal set even before issuing and RPC task so the task never
completes (ever).

I always thought that OPEN's can't be interrupted but I guess they are
since they call rpc_wait_for_completion_task() and that's a killable
event. But I don't know how to find out what's sending a signal to the
process. I'm rather stuck here trying to figure out where to go from
there. So I'm still trying to figure out what's causing the signal or
also how to recover from it that the client doesn't lose that seqid.

>
> Thanks,
> Jiufei




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux