Re: User process NFS write hang in wait_on_commit with kworker

"Benjamin Coddington" <bcodding@xxxxxxxxxx> · Tue, 02 Jul 2019 05:55:10 -0400

On 28 Jun 2019, at 14:33, Alan Post wrote:

> On Fri, Jun 21, 2019 at 02:47:23PM -0600, Alan Post wrote:
>>> Verifying this is the problem could be done by setting up some rolling
>>> network captures.. but sometimes it can be hard to not have the capture
>>> fill up with continuing traffic from other processes.
>>>
>>
>> I did go ahead and set up a rolling capture between this NFS
>> server and one rack of clients--I hope I can catch the event as
>> it happens.  Time will tell.
>>
>
> I've run this rolling capture and did catch four candidate events.
> I haven't confirmed any of them are real--I don't really know
> what it is I'm looking for, so I've been approaching the problem
> by incrementally/recursively throwing stuff out and manually
> working through what's left.
>
> As far as I understand it, for a particular xid, there should be a
> call and a reply.  The approach I took then was to pull out these
> fields from my capture and ignore RPC calls where both are present
> in my capture.  It seems this is simplistic, as the number of RPC
> calls I have without an attendant reply isn't lining up with my
> incident window.

Does your capture report dropped packets?  If so, maybe you need to increase
the capture buffer.

There are the sunrpc:xprt_transmit and sunrpc:xprt_complete_rqst tracepoints
as well that should show the xids.

> In one example, I have a series of READ calls which cease
> generating RPC reply messages as the offset for the file continues
> to increases.  After a couple/few dozen messages, the RPC replies
> continue as they were.  Is there a normal or routine explanation
> for this?
>
> RFC 5531 and the NetworkTracing page on wiki.linux-nfs.org have
> been quite helpful bringing me up to speed.  If any of you have
> advice or guidance or can clarify my understanding of how the
> call/reply RPC mechanism works I appreciate it.

Seems like you understand it.  Do you have specific questions?

Ben