Re: client's caching of server-side capabilities

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 30 Jun 2021 15:52:13 +0000

On Wed, 2021-06-30 at 11:22 -0400, J. Bruce Fields wrote:
> On Tue, Jun 29, 2021 at 01:51:43PM +0000, Chuck Lever III wrote:
> > 
> > 
> > > On Jun 29, 2021, at 9:48 AM, Olga Kornievskaia <aglo@xxxxxxxxx>
> > > wrote:
> > > 
> > > On Tue, Jun 29, 2021 at 8:58 AM Chuck Lever III
> > > <chuck.lever@xxxxxxxxxx> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Jun 28, 2021, at 6:06 PM, Trond Myklebust
> > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > On Mon, 2021-06-28 at 16:23 -0400, Olga Kornievskaia wrote:
> > > > > > Hi folks,
> > > > > > 
> > > > > > I have a general question of why the client doesn't throw
> > > > > > away the
> > > > > > cached server's capabilities on server reboot. Say a client
> > > > > > mounted a
> > > > > > server when the server didn't support security_labels, then
> > > > > > the
> > > > > > server
> > > > > > was rebooted and support was enabled. Client re-establishes
> > > > > > its
> > > > > > clientid/session, recovers state, but assumes all the old
> > > > > > capabilities
> > > > > > apply. A remount is required to clear old/find new
> > > > > > capabilities. The
> > > > > > opposite is true that a capability could be removed (but
> > > > > > I'm assuming
> > > > > > that's a less practical example).
> > > > > > 
> > > > > > I'm curious what are the problems of clearing server
> > > > > > capabilities and
> > > > > > rediscovering them on reboot? Is it because a local
> > > > > > filesystem could
> > > > > > never have its attributes changed and thus a network file
> > > > > > system
> > > > > > can't
> > > > > > either?
> > > > > > 
> > > > > > Thank you.
> > > > > 
> > > > > In my opinion, the client should aim for the absolute minimum
> > > > > overhead
> > > > > on a server reboot. The goal should be to recover state and
> > > > > get I/O
> > > > > started again as quickly as possible.
> > > > 
> > > > I 100% agree with the above. However...
> > > > 
> > > > 
> > > > > Detection of new features, etc
> > > > > can wait until the client needs to restart.
> > > > 
> > > > A server reboot can be part of a failover to a different
> > > > server. I
> > > > think capability discovery needs to happen as part of server
> > > > reboot
> > > > recovery, it can't be optimized away.
> > > 
> > > Can you clarify what you mean by a "failover to a different
> > > server"?
> > 
> > IP-based failover means that a server can crash, and its partner
> > can
> > detect that and take over the IP address and exports of the failed
> > server. The replacement server doesn't have to have exactly the
> > same
> > set of capabilities.
> 
> So it could also lose capabilities?
> 
> I'm a little nervous about server features being changed out from
> under
> the client while the client has the server mounted.
> 
> But, I don't know, looking quickly through the list of NFS_CAP_*
> definitions in nfs_fs_sb.h, I'm not coming up with a case where we
> couldn't handle it, maybe it's OK.
> 
> --b.

I'm not taking any patches for the server reboot case. If someone wants
to do it for the migration case, then fine: that's not a case that is
common or that requires performance. However reprobing all mounted
filesystems on every server reboot is NACKed.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx