Re: What functionality is expected from persistent NFS-client tracking?

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 6 Feb 2013 14:33:48 -0500

On Wed, Feb 06, 2013 at 08:25:10PM +0100, Niels de Vos wrote:
> On Wed, Feb 06, 2013 at 01:54:28PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 06, 2013 at 06:19:56PM +0100, Niels de Vos wrote:
> > > On Thu, Jan 31, 2013 at 03:19:28PM -0500, J. Bruce Fields wrote:
> > > > On Thu, Jan 31, 2013 at 10:20:27AM +0100, Niels de Vos wrote:
> > > > > Well, the NFS-server dynamically gets exports (GlusterFS volumes) added 
> > > > > when these are started or newly created. There is no hard requirement 
> > > > > that a specific volume is available for the NFS-server to place a shared 
> > > > > files with a list of NFS-clients.
> > > > 
> > > > I'm not sure what you mean by "there is not hard requirement ...".
> > > > 
> > > > Surely it's a requirement that an NFS server have available at startup,
> > > > at a minimum:
> > > > 
> > > > 	- all exported volumes
> > > > 	- whichever volume contains /var/lib/nfs/statd/, if that's on
> > > > 	  glusterfs.
> > > > 
> > > > otherwise reboot recovery won't work.  (And failover definitely won't
> > > > work.)
> > > 
> > > Well, with the current state of things, the GlusterFS NFS-server (gNFS) 
> > > does not enforce that there are any volumes available to export. These 
> > > can be added dynamically (similar to calling exportfs for Linux nfsd).  
> > 
> > Sure, it's fine to allow that, just as long we make sure that anything
> > already exported is available before server start.
> > 
> > > When an NFS-client tries to mount an export immediately after gNFS has 
> > > been started, the MOUNT will return ENOENT :-/
> > 
> > It's not new mounts that are the problem, it's preexisting mounts after
> > a server reboot:
> > 
> > An application already has a file open.  The server reboots, and as soon
> > as it's back up the client sends an operation using that filehandle.  If
> > the server fails to recognize the filehandle and returns ESTALE, that
> > ESTALE gets returned to the application--definitely a bug.
> 
> Yes, I understand that now. Currently the NFS server starts listening, 
> and volumes to export will be added a little later...
> 
> > So for correct reboot recovery support, any export in use on the
> > previous boot has to be back up before the NFS server starts listening
> > for rpc's.
> 
> Which is not the case at the moment.
> 
> > (Alternatively the server could look at the filehandle, recognize that
> > it's for a volume that hasn't come up yet, and return EJUKEBOX.  I don't
> > think gluster does that.)
> 
> I very much doubt that as well. The error is defined in the sources, but 
> I do not see any usages. IMHO it is easier to make the exports available 
> in the NFS-server before listening for incoming RPC connections. Trying 
> to return EJUKEBOX would probably require knowledge of the volumes 
> anyway.

Yes, agreed.

> Thanks for these details explanations, at least I think I understand 
> what needs to be done before GlusterFS can offer a true high-available 
> NFS-server. (And hopefully I find some time to extend/improve the 
> behaviour bit by bit.)

Note that reboot recovery is something that users generally expect to
Just Work on any NFS server, so this is a bug to fix even before making
HA work.

--b.