Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 13 Aug 2021, Mike Javorski wrote:
> Neil:
> 
> Apologies for the delay, your message didn't get properly flagged in my email.

:-)

> 
> To answer your questions, both client (my Desktop PC) and server (my
> NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> w/ current or alternate testing kernels (see below).

So the bug could be in the server or the client.  I assume you are
careful to test a client against a know-good server, or a server against
a known-good client.

>                                                                 I
> intend to spend some time this weekend attempting to get the tcpdump.
> My initial attempts wound up with 400+Mb files which would be
> difficult to ship and use for diagnostics.

Rather than you sending me the dump, I'll send you the code.

Run
  tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'

This will ensure the NFS traffic is actually decoded as NFS and then
report only NFS(rpc) replies that arrive more than 1 second after the
request.
You can add

    -T fields -e frame.number -e rpc.time

to find out what the actual delay was.

If it reports any, that will be interesting.  Try with a larger time if
necessary to get a modest number of hits.  Using editcap and the given
frame number you can select out 1000 packets either side of the problem
and that should compress to be small enough to transport.

However it might not find anything.  If the reply never arrives, you'll
never get a reply with a long timeout.  So we need to check that
everything got a reply...

 tshark -r filename -t tcp.port==2049,rpc  \
   -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
   -e rpc.xid -e frame.number | sort > /tmp/requests

 tshark -r filename -t tcp.port==2049,rpc  \
   -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
   -e rpc.xid -e frame.number | sort > /tmp/replies

 join -a1 /tmp/requests /tmp/replies | awk 'NF==2'

This should list the xid and frame number of all requests that didn't
get a reply.  Again, editcap can extract a range of frames into a file of
manageable size.

Another possibility is that requests are getting replies, but the reply
says "NFS4ERR_DELAY"

 tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008

should report any reply with that error code.

Hopefully something there will be interesting.

NeilBrown




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux