On Mon, Apr 16, 2018 at 03:48:49PM -0400, Olga Kornievskaia wrote: > I have a failure that I’m investigating from the Bakeathon (this was > going against redhat-75 server. Not sure who was running that server. > But I believe that was RHEL7.5 server). I have a network trace and I > was wondering if you could help with what the server is doing. > > I’m attaching a network trace. The parts I’m interested in explaining > have to do with the kerberized backchannel for NFS4.0. > > A setup is client doing v3 and v4 mount and opening file with one > version and appending to it with a different version. Its opened with > 4.0 and got a delegation and it’s trying to write with v3 and server > is recalling a delegation > > Server is issuing CB_NULL gss_init trying to establish a gss context. > But it’s doing it twice in frame 259 and frame 261. It’s weird that > it’s doing it twice. But Ok. I also wonder why the client sent two sets of SETCLIENTID/SETCLIENTID_CONFIRM calls. The second gets back the same clientid as the first, so I think the only thing the server might do is update the callback information--but the callback information is the same in both cases. Maybe some server bug is causing it not to handle that update correctly? I also expect the server to start a CB_NULL as soon as it gets the setclientid_confirm, so I would have expected to see that sooner. > Now in frame, 283 it sends CB_COMPOUND CB_RECALL And in frame 285 it > sends CB_NULL with gss_data with the CB_NULL as the payload. I think > this is to establish the callback. > > In frame 287, client responds with RPC accept state of 6000 (which I > believe is "drop reply"). That value shouldn't ever appear on the wire. Looks like RHEL7 may need 0533b13072f4 "svc: Avoid garbage replies when pc_func() returns rpc_drop_reply". >From nfs4_callback_compound, there are a couple reasons it might return rpc_drop_reply--might be a failure to recognize the callback principal? > I believe what’s happening is that because the client hasn’t received > CB_NULL that establishes a callback channel but got a CB_RECALL it’s > just ignoring it. I see two succesful CB_NULL calls and replies, so I think the context establishment worked. I don't know why there's a third CB_NULL in frame 285. > What happens later is that server re-transmits the CB_COMPOUND but > client replies out of the cache. What’s interesting is that by this > time since CB_NULL that came after the CB_COMPOUND should have > established the callback and if the re-trasmission was instead a new > CB_RECALL, then it would have succeeded I would think. Server tries > twice and then finally, the sets the CB_PATH_DOWN on the RENEW that > client sends. > > Questions: 1. Do you see how CB_RECALL can travel before the callback > is established? I assume it's using the context established by one of the previous CB_NULLs. I guess you could check the context handle to confirm this. > 2. Should the server do something else beside > re-transmitting the CB_RECALL because it got this “drop reply” error > code back? Since the reply isn't really valid, I think retrying is the server's only option. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html