Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 09 Aug 2021, Mike Javorski wrote:
> I have been experiencing nfs file access hangs with multiple release
> versions of the 5.13.x linux kernel. In each case, all file transfers
> freeze for 5-10 seconds and then resume. This seems worse when reading
> through many files sequentially.

A particularly useful debugging tool for NFS freezes is to run

  rpcdebug -m rpc -c all

while the system appears frozen.  As you only have a 5-10 second window
this might be tricky.
Setting or clearing debug flags in the rpc module (whether they are
already set or not) has a side effect if listing all RPC "tasks" which a
waiting for a reply.  Seeing that task list can often be useful.

The task list appears in "dmesg" output.  If there are not tasks
waiting, nothing will be written which might lead you to think it didn't
work.

As Chuck hinted, tcpdump is invaluable for this sort of problem.
  tcpdump -s 0 -w /tmp/somefile.pcap port 2049

will capture NFS traffic.  If this can start before a hang, and finish
after, it may contain useful information.  Doing that in a way that
doesn't create an enormous file might be a challenge.  It would help if
you found a way trigger the problem.  Take note of the circumstances
when it seems to happen the most.  If you can only produce a large file,
we can probably still work with it.
  tshark -r /tmp/somefile.pcap
will report the capture one line per packet.  You can look for the
appropriate timestamp, note the frame numbers, and use "editcap"
to extract a suitable range of packets.

NeilBrown


> 
> My server:
> - Archlinux w/ a distribution provided kernel package
> - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> 
> Client:
> - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> (ver=4.2 is indicated in mount)
> 
> I have tried the 5.13.x kernel several times since the first arch
> release (most recently with 5.13.9-arch1-1), all with similar results.
> Each time, I am forced to downgrade the linux package to a 5.12.x
> kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> and stabilize performance. No other changes are made between tests. I
> have confirmed the freezing behavior using both ext4 and btrfs
> filesystems exported from this server.
> 
> At this point I would appreciate some guidance in what to provide in
> order to diagnose and resolve this issue. I don't have a lot of kernel
> debugging experience, so instruction would be helpful.
> 
> - mike
> 
> 




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux