Thanks, Soumya, for giving me a confirmation on the time for the take-over process. That's very helpful.
The 'showmount' got hung on the destination host while i see ganesha-nfsd was running.
On that, 1 thing i tried out-of-ordinary was to set a 2nd set of *cluster_ip for the storage-nodes using 'pcs'. Meaning, other than the VIP i have in 'ganesha-ha.conf', i also used 'pcs' to create another set of VIPs, and thought of using 'constraint colocation' to get both set of VIPs and NFSd going. Though both set of VIPs did failover to the destination node, but the ganesha-nfsd was hung at the destination node when the source node got brought down. I am guessing this setup probably caused the confusion of nfs-ganesha ... ? Is that a No-No to have more than 1 set of VIPs for NFS-Ganesha in different subnets?
Any suggestions, and advice will be appreciated.
On Sun, Feb 5, 2017 at 11:50 PM, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
On 02/04/2017 06:20 AM, ML Wong wrote:
Thanks so much for your promptly response, Soumya.
That helps clearing out one of my questions. I am trying to figure out
why NFS service did not failover/pick-up the NFS clients last time when
one of our cluster-nodes failed.
Though i could see, in corosync.log, a notify got sent to the cluster
the failed node, the election, and the IP failover process seems to all
be finished with in around minute. However, after the IP failover to the
destinated node, i tried to do a "showmount -e localhost" - the command
got hung.But, i still see ganesha-nfsd is running in the host.
Is it on destination node? I mean does 'showmount -e localhost' get hung on destination node post IP failover.
To your
expertise, if i understand the process correctly, given that i keep all
the default timeout/interval settings for nfs-mon, nfs-grace, the entire
IP failover, and NFS service failover process should be completed within
2 minutes. Am i correct?
Thats right.
Thanks,
Soumya
Your help is again appreciated.
On Thu, Feb 2, 2017 at 11:42 PM, Soumya Koduri <skoduri@xxxxxxxxxxGluster-users@xxxxxxxxxxx <mailto:Gluster-users@gluster.<mailto:skoduri@xxxxxxxxxx>> wrote:
Hi,
On 02/03/2017 07:52 AM, ML Wong wrote:
Hello All,
Any pointers will be very-much appreciated. Thanks in advance!
Environment:
Running centOS 7.2.511
Gluster: 3.7.16, with nfs-ganesha on 2.3.0.1 from
centos-gluster37 repo
sha1: cab5df4064e3a31d1d92786d91bd41d91517fba8 ganesha-ha.sh
we have used this set up in 3 different gluster, nfs-ganesha
environment. The cluster got setup when we do 'gluster nfs-ganesha
enable' , and we can serve NFS without issues. And i see all the
resources got created, but not the *hostname*-trigger_ip-1
resources? Is
that normal?
Yes it is normal. With change [1], new resource agent attributes
have been introduced in place of *-trigger_ip-1 to monitor, move the
VIP and put the cluster in grace. More details are in the change#
commit msg.
Thanks,
Soumya
[1]
https://github.com/gluster/glusterfs/commit/e8121c4afb3680f5 32b450872b5a3ffcb3766a97
<https://github.com/gluster/glusterfs/commit/e8121c4afb3680f >532b450872b5a3ffcb3766a97
without *hostname*-trigger_ip-1, according to ganesha-ha.sh,
wouldn't it
affect the NFS going into grace, and help to transition the NFS
service
to other member nodes at the times of node-failures? please
correct me
if i misunderstood.
I tried issuing both 'gluster nfs-ganesha enable', and 'bash -x
/usr/libexec/ganesha/ganesha-ha.sh --setup'. In both scenarios,
i still
don't see the *hostname*-trigger_ip-1 got created.
below is my ganesha-ha.conf
HA_NAME="ganesha-ha-01"
HA_VOL_SERVER="vm-fusion1"
HA_CLUSTER_NODES="vm-fusion1,vm-fusion3"
VIP_vm-fusion1="192.168.30.211"
VIP_vm-fusion3="192.168.30.213"
_______________________________________________
Gluster-users mailing listorg >
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users