On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote: > > > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > >> > >> > >> > >>> On Jun 28, 2021, at 6:06 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > >>> > >>> On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote: > >>>> Hi folks, > >>>> > >>>> I have a general question of why the client doesn't throw away the > >>>> cached server's capabilities on server reboot. Say a client mounted a > >>>> server when the server didn't support security_labels, then the > >>>> server > >>>> was rebooted and support was enabled. Client re-establishes its > >>>> clientid/session, recovers state, but assumes all the old > >>>> capabilities > >>>> apply. A remount is required to clear old/find new capabilities. The > >>>> opposite is true that a capability could be removed (but I'm assuming > >>>> that's a less practical example). > >>>> > >>>> I'm curious what are the problems of clearing server capabilities and > >>>> rediscovering them on reboot? Is it because a local filesystem could > >>>> never have its attributes changed and thus a network file system > >>>> can't > >>>> either? > >>>> > >>>> Thank you. > >>> > >>> In my opinion, the client should aim for the absolute minimum overhead > >>> on a server reboot. The goal should be to recover state and get I/O > >>> started again as quickly as possible. > >> > >> I 100% agree with the above. However... > >> > >> > >>> Detection of new features, etc > >>> can wait until the client needs to restart. > >> > >> A server reboot can be part of a failover to a different server. I > >> think capability discovery needs to happen as part of server reboot > >> recovery, it can't be optimized away. > > > > Can you clarify what you mean by a "failover to a different server"? > > IP-based failover means that a server can crash, and its partner can > detect that and take over the IP address and exports of the failed > server. The replacement server doesn't have to have exactly the same > set of capabilities. So it could also lose capabilities? I'm a little nervous about server features being changed out from under the client while the client has the server mounted. But, I don't know, looking quickly through the list of NFS_CAP_* definitions in nfs_fs_sb.h, I'm not coming up with a case where we couldn't handle it, maybe it's OK. --b.