Re: client's caching of server-side capabilities

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 29 Jun 2021 13:51:43 +0000

> On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
>>>> Hi folks,
>>>> 
>>>> I have a general question of why the client doesn't throw away the
>>>> cached server's capabilities on server reboot. Say a client mounted a
>>>> server when the server didn't support security_labels, then the
>>>> server
>>>> was rebooted and support was enabled. Client re-establishes its
>>>> clientid/session, recovers state, but assumes all the old
>>>> capabilities
>>>> apply. A remount is required to clear old/find new capabilities. The
>>>> opposite is true that a capability could be removed (but I'm assuming
>>>> that's a less practical example).
>>>> 
>>>> I'm curious what are the problems of clearing server capabilities and
>>>> rediscovering them on reboot? Is it because a local filesystem could
>>>> never have its attributes changed and thus a network file system
>>>> can't
>>>> either?
>>>> 
>>>> Thank you.
>>> 
>>> In my opinion, the client should aim for the absolute minimum overhead
>>> on a server reboot. The goal should be to recover state and get I/O
>>> started again as quickly as possible.
>> 
>> I 100% agree with the above. However...
>> 
>> 
>>> Detection of new features, etc
>>> can wait until the client needs to restart.
>> 
>> A server reboot can be part of a failover to a different server. I
>> think capability discovery needs to happen as part of server reboot
>> recovery, it can't be optimized away.
> 
> Can you clarify what you mean by a "failover to a different server"?

IP-based failover means that a server can crash, and its partner can
detect that and take over the IP address and exports of the failed
server. The replacement server doesn't have to have exactly the same
set of capabilities.

> To do reboot recovery it has to be the "same" server (by the
> definitions of the RFC). My use case I was thinking of was a reboot of
> the "same" server (major, minor, scope same) but with new features but
> of course one could argue if it has new features it's no longer the
> "same" server. I think you are probably thinking about migration or
> are you thinking of telling a difference between session trunkable
> servers which are considered to be the same but since it's a different
> IP it might have different capabilities?
> 
> Thank you for the feedback!
> 
>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever