Re: client's caching of server-side capabilities

bfields@xxxxxxxxxxxx (J. Bruce Fields) · Wed, 30 Jun 2021 11:22:58 -0400

On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> 
> 
> > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> > 
> > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> >> 
> >> 
> >> 
> >>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>> 
> >>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> >>>> Hi folks,
> >>>> 
> >>>> I have a general question of why the client doesn't throw away the
> >>>> cached server's capabilities on server reboot. Say a client mounted a
> >>>> server when the server didn't support security_labels, then the
> >>>> server
> >>>> was rebooted and support was enabled. Client re-establishes its
> >>>> clientid/session, recovers state, but assumes all the old
> >>>> capabilities
> >>>> apply. A remount is required to clear old/find new capabilities. The
> >>>> opposite is true that a capability could be removed (but I'm assuming
> >>>> that's a less practical example).
> >>>> 
> >>>> I'm curious what are the problems of clearing server capabilities and
> >>>> rediscovering them on reboot? Is it because a local filesystem could
> >>>> never have its attributes changed and thus a network file system
> >>>> can't
> >>>> either?
> >>>> 
> >>>> Thank you.
> >>> 
> >>> In my opinion, the client should aim for the absolute minimum overhead
> >>> on a server reboot. The goal should be to recover state and get I/O
> >>> started again as quickly as possible.
> >> 
> >> I 100% agree with the above. However...
> >> 
> >> 
> >>> Detection of new features, etc
> >>> can wait until the client needs to restart.
> >> 
> >> A server reboot can be part of a failover to a different server. I
> >> think capability discovery needs to happen as part of server reboot
> >> recovery, it can't be optimized away.
> > 
> > Can you clarify what you mean by a "failover to a different server"?
> 
> IP-based failover means that a server can crash, and its partner can
> detect that and take over the IP address and exports of the failed
> server. The replacement server doesn't have to have exactly the same
> set of capabilities.

So it could also lose capabilities?

I'm a little nervous about server features being changed out from under
the client while the client has the server mounted.

But, I don't know, looking quickly through the list of NFS_CAP_*
definitions in nfs_fs_sb.h, I'm not coming up with a case where we
couldn't handle it, maybe it's OK.

--b.