Re: nfsd issue with a kerberized callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 16, 2018 at 03:48:49PM -0400, Olga Kornievskaia wrote:
> I have a failure that I’m investigating from the Bakeathon (this was
> going against redhat-75 server. Not sure who was running that server.
> But I believe that was RHEL7.5 server). I have a network trace and I
> was wondering if you could help with what the server is doing.
>
> I’m attaching a network trace. The parts I’m interested in explaining
> have to do with the kerberized backchannel for NFS4.0.
> 
> A setup is client doing v3 and v4 mount and opening file with one
> version and appending to it with a different version. Its opened with
> 4.0 and got a delegation and it’s trying to write with v3 and server
> is recalling a delegation
> 
> Server is issuing CB_NULL gss_init trying to establish a gss context.
> But it’s doing it twice in frame 259 and frame 261. It’s weird that
> it’s doing it twice. But Ok.

I also wonder why the client sent two sets of
SETCLIENTID/SETCLIENTID_CONFIRM calls.  The second gets back the same
clientid as the first, so I think the only thing the server might do is
update the callback information--but the callback information is the
same in both cases.  Maybe some server bug is causing it not to handle
that update correctly?

I also expect the server to start a CB_NULL as soon as it gets the
setclientid_confirm, so I would have expected to see that sooner.

> Now in frame, 283 it sends CB_COMPOUND CB_RECALL And in frame 285 it
> sends CB_NULL with gss_data with the CB_NULL as the payload. I think
> this is to establish the callback.
> 
> In frame 287, client responds with RPC accept state of 6000 (which I
> believe is "drop reply").

That value shouldn't ever appear on the wire.

Looks like RHEL7 may need 0533b13072f4 "svc: Avoid garbage replies when
pc_func() returns rpc_drop_reply".

>From nfs4_callback_compound, there are a couple reasons it might return
rpc_drop_reply--might be a failure to recognize the callback principal?

> I believe what’s happening is that because the client hasn’t received
> CB_NULL that establishes a callback channel but got a CB_RECALL it’s
> just ignoring it.

I see two succesful CB_NULL calls and replies, so I think the context
establishment worked.  I don't know why there's a third CB_NULL in frame
285.

> What happens later is that server re-transmits the CB_COMPOUND but
> client replies out of the cache. What’s interesting is that by this
> time since CB_NULL that came after the CB_COMPOUND should have
> established the callback and if the re-trasmission was instead a new
> CB_RECALL, then it would have succeeded I would think. Server tries
> twice and then finally, the sets the CB_PATH_DOWN on the RENEW that
> client sends.
> 
> Questions: 1. Do you see how CB_RECALL can travel before the callback
> is established?

I assume it's using the context established by one of the previous
CB_NULLs.  I guess you could check the context handle to confirm this.

> 2. Should the server do something else beside
> re-transmitting the CB_RECALL because it got this “drop reply” error
> code back? 

Since the reply isn't really valid, I think retrying is the server's
only option.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux