On 7/3/2019 5:32 PM, Alan Post wrote:
On Tue, Jul 02, 2019 at 05:55:10AM -0400, Benjamin Coddington wrote:
As far as I understand it, for a particular xid, there should be a
call and a reply. The approach I took then was to pull out these
fields from my capture and ignore RPC calls where both are present
in my capture. It seems this is simplistic, as the number of RPC
calls I have without an attendant reply isn't lining up with my
incident window.
Does your capture report dropped packets? If so, maybe you need to increase
the capture buffer.
I'm not certain, but I do have a capture on both the NFS server and
the NFS client--comparing them would show me if I was under most
circumstances. Good catch.
In one example, I have a series of READ calls which cease
generating RPC reply messages as the offset for the file continues
to increases. After a couple/few dozen messages, the RPC replies
continue as they were. Is there a normal or routine explanation
for this?
RFC 5531 and the NetworkTracing page on wiki.linux-nfs.org have
been quite helpful bringing me up to speed. If any of you have
advice or guidance or can clarify my understanding of how the
call/reply RPC mechanism works I appreciate it.
Seems like you understand it. Do you have specific questions?
Is it true that for each RPC call there is an RPC reply with the
same xid? Is it a-priori an error if an otherwise correct RPC
call is not eventually paired with an RPC reply?
Absolutely yes. Not replying would be like a local procedure never
returning.
But remember XIDs are not globally unique. They are only unique within
some limited span of time for the connection they were issued on. This
is typically only a problem on very high IOPS workloads, or over long
spans of time.
Tom.