Re: client's caching of server-side capabilities

Olga Kornievskaia <aglo@xxxxxxxxx> · Wed, 30 Jun 2021 12:48:47 -0400

On Wed, Jun 30, 2021 at 11:23 AM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>
> On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> > >>
> > >>
> > >>
> > >>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> > >>>
> > >>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > >>>> Hi folks,
> > >>>>
> > >>>> I have a general question of why the client doesn't throw away the
> > >>>> cached server's capabilities on server reboot. Say a client mounted a
> > >>>> server when the server didn't support security_labels, then the
> > >>>> server
> > >>>> was rebooted and support was enabled. Client re-establishes its
> > >>>> clientid/session, recovers state, but assumes all the old
> > >>>> capabilities
> > >>>> apply. A remount is required to clear old/find new capabilities. The
> > >>>> opposite is true that a capability could be removed (but I'm assuming
> > >>>> that's a less practical example).
> > >>>>
> > >>>> I'm curious what are the problems of clearing server capabilities and
> > >>>> rediscovering them on reboot? Is it because a local filesystem could
> > >>>> never have its attributes changed and thus a network file system
> > >>>> can't
> > >>>> either?
> > >>>>
> > >>>> Thank you.
> > >>>
> > >>> In my opinion, the client should aim for the absolute minimum overhead
> > >>> on a server reboot. The goal should be to recover state and get I/O
> > >>> started again as quickly as possible.
> > >>
> > >> I 100% agree with the above. However...
> > >>
> > >>
> > >>> Detection of new features, etc
> > >>> can wait until the client needs to restart.
> > >>
> > >> A server reboot can be part of a failover to a different server. I
> > >> think capability discovery needs to happen as part of server reboot
> > >> recovery, it can't be optimized away.
> > >
> > > Can you clarify what you mean by a "failover to a different server"?
> >
> > IP-based failover means that a server can crash, and its partner can
> > detect that and take over the IP address and exports of the failed
> > server. The replacement server doesn't have to have exactly the same
> > set of capabilities.
>
> So it could also lose capabilities?

Well, wouldn't the client lose capabilities even now? Operations
relying on those capabilities wouldn't work (ie., say security label
wouldn't be returned or an operation would error with ENOTSUPP). And I
think when it comes to operations, that's fine as the capability would
then be adjusted (removed).

To make it clear again, I'm not suggesting to do it at server reboot
as it was pointed out to cause performance problems.

> I'm a little nervous about server features being changed out from under
> the client while the client has the server mounted.
>
> But, I don't know, looking quickly through the list of NFS_CAP_*
> definitions in nfs_fs_sb.h, I'm not coming up with a case where we
> couldn't handle it, maybe it's OK.
>
> --b.