Re: CephFS with active-active NFS Ganesha

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 06 May 2020 19:49:38 -0400

On Wed, 2020-05-06 at 11:58 -0700, Patrick Donnelly wrote:
> Hello Michael,
> 
> On Wed, Mar 11, 2020 at 1:24 AM Michael Bisig <michael.bisig@xxxxxxxxx> wrote:
> > Hi all,
> > 
> > I am trying to setup an active-active NFS Ganesha cluster (with two Ganeshas (v3.0) running in Docker containers). I could manage to get two Ganesha daemons running using the rados_cluster backend for active-active deployment. I have the grace db within the cephfs metadata pool in an own namespace which keeps track on the node status.
> > Now, I can mount the exposed filesystem over NFS (v4.1, v4.2) with both daemons. So far so good. __
> > 
> > Testing high availability resulted in an unexpected behavior for that I am not sure whether it is intentional or whether it is a configuration problem.
> > 
> > Problem:
> > If both are running, no E or N flags are set within the grace db, as I expect. Once, one host goes down (or is taken down) ALL clients cannot read nor write to the mounted filesystem, even the clients which are not connected to dead ganesha. In the db, I see that the dead ganesha has state NE and the active has E. This state is what I expect from the Ganesha documentation. Nevertheless, I would assume that the clients connected to the active daemon are not blocked. This state is not cleaned up by itself (e.g. after the grace period).
> > I can unlock this situation by 'lifting' the dead node with a direct db call (using ganesha-rados-grace tool). But within an active-active deployment this is not suitable.
> > 
> > The ganesha config looks like:
> > 
> > ------------
> > NFS_CORE_PARAM
> > {
> >         Enable_NLM = false;
> >         Protocols = 4;
> > }
> > NFSv4
> > {
> >         RecoveryBackend = rados_cluster;
> >         Minor_Versions =  1,2;
> > }
> > RADOS_KV
> > {
> >     pool = "cephfsmetadata";
> >     nodeid = "a" ;
> >     namespace = "grace";
> >     UserId = "ganesha";
> >     Ceph_Conf = "/etc/ceph/ceph.conf";
> > }
> > MDCACHE {
> >         Dir_Chunk = 0;
> >         NParts = 1;
> >         Cache_Size = 1;
> > }
> > EXPORT
> > {
> >         Export_ID=101;
> >         Protocols = 4;
> >         Transports = TCP;
> >         Path = PATH;
> >         Pseudo = PSEUDO_PATH;
> >         Access_Type = RW;
> >         Attr_Expiration_Time = 0;
> >         Squash = no_root_squash;
> > 
> >         FSAL {
> >                 Name = CEPH;
> >                 User_Id = "ganesha";
> >                 Secret_Access_Key = CEPHXKEY;
> >         }
> > }
> > LOG {
> >         Default_Log_Level = "FULL_DEBUG";
> > }
> > ------------
> > 
> > Does anyone have similar problems? Or if this behavior is by purpose, can you explain to me why this is the case?
> > Thank you in advance for your time and thoughts.
> 
> Here's what Jeff Layton had to say (he didn't get the mail posting somehow):
> 
> "Yes that is expected. Either the node needs to come back or you have
> to take the dead node out of the cluster using ganesha-rados-grace.
> 
> [You] mention that doing the latter is "not suitable" for some
> reason, but I don't get why. If the node is down and not coming back,
> why wouldn't you declare it dead and just remove it?"
> 

Yeah, not sure why I didn't get this message before, but...

To be clear, you wouldn't want to "lift" the grace period for the node,
you'd want to "remove" it in this situation. Given that the now-dead
node has its N (need) flag set, I'm assuming you shut it down gracefully
too (it didn't just fall over and die).

If you do the remove before you shut down the node, the node being shut
down would see that it's no longer part of the cluster and would avoid
requesting a grace period as it goes down.

Furthermore, the node being removed would also release its state with
the MDSs, which would ensure that the other ganesha nodes don't end up
waiting on caps that the dying node held as it went down.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx