Hi Bruce, I have a failure that I’m investigating from the Bakeathon (this was going against redhat-75 server. Not sure who was running that server. But I believe that was RHEL7.5 server). I have a network trace and I was wondering if you could help with what the server is doing. I’m attaching a network trace. The parts I’m interested in explaining have to do with the kerberized backchannel for NFS4.0. A setup is client doing v3 and v4 mount and opening file with one version and appending to it with a different version. Its opened with 4.0 and got a delegation and it’s trying to write with v3 and server is recalling a delegation Server is issuing CB_NULL gss_init trying to establish a gss context. But it’s doing it twice in frame 259 and frame 261. It’s weird that it’s doing it twice. But Ok. Now in frame, 283 it sends CB_COMPOUND CB_RECALL And in frame 285 it sends CB_NULL with gss_data with the CB_NULL as the payload. I think this is to establish the callback. In frame 287, client responds with RPC accept state of 6000 (which I believe is "drop reply"). I believe what’s happening is that because the client hasn’t received CB_NULL that establishes a callback channel but got a CB_RECALL it’s just ignoring it. What happens later is that server re-transmits the CB_COMPOUND but client replies out of the cache. What’s interesting is that by this time since CB_NULL that came after the CB_COMPOUND should have established the callback and if the re-trasmission was instead a new CB_RECALL, then it would have succeeded I would think. Server tries twice and then finally, the sets the CB_PATH_DOWN on the RENEW that client sends. Questions: 1. Do you see how CB_RECALL can travel before the callback is established? 2. Should the server do something else beside re-transmitting the CB_RECALL because it got this “drop reply” error code back?
Attachment:
nfstest_interop_20180329095958_2.cap
Description: Binary data