On Wed, 2021-06-30 at 12:48 -0400, Olga Kornievskaia wrote: > On Wed, Jun 30, 2021 at 11:23 AM J. Bruce Fields > <bfields@xxxxxxxxxxxx> wrote: > > > > On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote: > > > > > > > > > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx> > > > > wrote: > > > > > > > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III > > > > <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > On Jun 28, 2021, at 6:06 PM, Trond Myklebust > > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote: > > > > > > > Hi folks, > > > > > > > > > > > > > > I have a general question of why the client doesn't throw > > > > > > > away the > > > > > > > cached server's capabilities on server reboot. Say a > > > > > > > client mounted a > > > > > > > server when the server didn't support security_labels, > > > > > > > then the > > > > > > > server > > > > > > > was rebooted and support was enabled. Client re- > > > > > > > establishes its > > > > > > > clientid/session, recovers state, but assumes all the old > > > > > > > capabilities > > > > > > > apply. A remount is required to clear old/find new > > > > > > > capabilities. The > > > > > > > opposite is true that a capability could be removed (but > > > > > > > I'm assuming > > > > > > > that's a less practical example). > > > > > > > > > > > > > > I'm curious what are the problems of clearing server > > > > > > > capabilities and > > > > > > > rediscovering them on reboot? Is it because a local > > > > > > > filesystem could > > > > > > > never have its attributes changed and thus a network file > > > > > > > system > > > > > > > can't > > > > > > > either? > > > > > > > > > > > > > > Thank you. > > > > > > > > > > > > In my opinion, the client should aim for the absolute > > > > > > minimum overhead > > > > > > on a server reboot. The goal should be to recover state and > > > > > > get I/O > > > > > > started again as quickly as possible. > > > > > > > > > > I 100% agree with the above. However... > > > > > > > > > > > > > > > > Detection of new features, etc > > > > > > can wait until the client needs to restart. > > > > > > > > > > A server reboot can be part of a failover to a different > > > > > server. I > > > > > think capability discovery needs to happen as part of server > > > > > reboot > > > > > recovery, it can't be optimized away. > > > > > > > > Can you clarify what you mean by a "failover to a different > > > > server"? > > > > > > IP-based failover means that a server can crash, and its partner > > > can > > > detect that and take over the IP address and exports of the > > > failed > > > server. The replacement server doesn't have to have exactly the > > > same > > > set of capabilities. > > > > So it could also lose capabilities? > > Well, wouldn't the client lose capabilities even now? Operations > relying on those capabilities wouldn't work (ie., say security label > wouldn't be returned or an operation would error with ENOTSUPP). And > I > think when it comes to operations, that's fine as the capability > would > then be adjusted (removed). > > To make it clear again, I'm not suggesting to do it at server reboot > as it was pointed out to cause performance problems. > Yep. The reason why I'd be more tolerant of this in the case of migration/server failover is because in that case, the client is already expected to trawl the various mountpoints for NFS4ERR_MOVED errors, and running fs_locations probes anyway. The process is already slow and disruptive, so throwing in an fsinfo probe to the new server isn't really a big deal. > > I'm a little nervous about server features being changed out from > > under > > the client while the client has the server mounted. > > > > But, I don't know, looking quickly through the list of NFS_CAP_* > > definitions in nfs_fs_sb.h, I'm not coming up with a case where we > > couldn't handle it, maybe it's OK. > > > > --b. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx