On Fri, Jun 21, 2019 at 02:47:23PM -0600, Alan Post wrote: > > Verifying this is the problem could be done by setting up some rolling > > network captures.. but sometimes it can be hard to not have the capture > > fill up with continuing traffic from other processes. > > > > I did go ahead and set up a rolling capture between this NFS > server and one rack of clients--I hope I can catch the event as > it happens. Time will tell. > I've run this rolling capture and did catch four candidate events. I haven't confirmed any of them are real--I don't really know what it is I'm looking for, so I've been approaching the problem by incrementally/recursively throwing stuff out and manually working through what's left. As far as I understand it, for a particular xid, there should be a call and a reply. The approach I took then was to pull out these fields from my capture and ignore RPC calls where both are present in my capture. It seems this is simplistic, as the number of RPC calls I have without an attendant reply isn't lining up with my incident window. In one example, I have a series of READ calls which cease generating RPC reply messages as the offset for the file continues to increases. After a couple/few dozen messages, the RPC replies continue as they were. Is there a normal or routine explanation for this? RFC 5531 and the NetworkTracing page on wiki.linux-nfs.org have been quite helpful bringing me up to speed. If any of you have advice or guidance or can clarify my understanding of how the call/reply RPC mechanism works I appreciate it. -A -- Alan Post | Xen VPS hosting for the technically adept PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/ email: adp@xxxxxxxxx