Re: Questions on ganesha HA and shared storage size

Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> · Tue, 09 Jun 2015 18:17:01 +0200

Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f 2 -d "=")

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE="/etc/ganesha/ganesha.conf"

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(
Cheers,

	Alessandro

On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
> OK, I can confirm that the ganesha.nsfd process is actually not
> answering to the calls. Here it is what I see:
> 
> # rpcinfo -p
>    program vers proto   port  service
>     100000    4   tcp    111  portmapper
>     100000    3   tcp    111  portmapper
>     100000    2   tcp    111  portmapper
>     100000    4   udp    111  portmapper
>     100000    3   udp    111  portmapper
>     100000    2   udp    111  portmapper
>     100024    1   udp  41594  status
>     100024    1   tcp  53631  status
>     100003    3   udp   2049  nfs
>     100003    3   tcp   2049  nfs
>     100003    4   udp   2049  nfs
>     100003    4   tcp   2049  nfs
>     100005    1   udp  58127  mountd
>     100005    1   tcp  56301  mountd
>     100005    3   udp  58127  mountd
>     100005    3   tcp  56301  mountd
>     100021    4   udp  46203  nlockmgr
>     100021    4   tcp  41798  nlockmgr
>     100011    1   udp    875  rquotad
>     100011    1   tcp    875  rquotad
>     100011    2   udp    875  rquotad
>     100011    2   tcp    875  rquotad
> 
> # netstat -lpn | grep ganesha
> tcp6      14      0 :::2049                 :::*
> LISTEN      11937/ganesha.nfsd  
> tcp6       0      0 :::41798                :::*
> LISTEN      11937/ganesha.nfsd  
> tcp6       0      0 :::875                  :::*
> LISTEN      11937/ganesha.nfsd  
> tcp6      10      0 :::56301                :::*
> LISTEN      11937/ganesha.nfsd  
> tcp6       0      0 :::564                  :::*
> LISTEN      11937/ganesha.nfsd  
> udp6       0      0 :::2049                 :::*
> 11937/ganesha.nfsd  
> udp6       0      0 :::46203                :::*
> 11937/ganesha.nfsd  
> udp6       0      0 :::58127                :::*
> 11937/ganesha.nfsd  
> udp6       0      0 :::875                  :::*
> 11937/ganesha.nfsd
> 
> I'm attaching the strace of a showmount from a node to the other.
> This machinery was working with nfs-ganesha 2.1.0, so it must be
> something introduced with 2.2.0.
> Cheers,
> 
> 	Alessandro
> 
> 
> 
> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
> > 
> > On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
> > > Hi,
> > > OK, the problem with the VIPs not starting is due to the ganesha_mon
> > > heartbeat script looking for a pid file called
> > > /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
> > > creating /var/run/ganesha.pid, this needs to be corrected. The file is
> > > in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
> > > For the moment I have created a symlink in this way and it works:
> > >
> > > ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
> > >
> > Thanks. Please update this as well in the bug.
> > 
> > > So far so good, the VIPs are up and pingable, but still there is the
> > > problem of the hanging showmount (i.e. hanging RPC).
> > > Still, I see a lot of errors like this in /var/log/messages:
> > >
> > > Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
> > > nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]
> > >
> > > While ganesha.log shows the server is not in grace:
> > >
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
> > > Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
> > > May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
> > > <http://buildhw-09.phx2.fedoraproject.org>
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
> > > :Configuration file successfully parsed
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
> > > :Initializing ID Mapper.
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
> > > successfully initialized.
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
> > > found in configuration file !!!
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
> > > ((null):0): Empty configuration file
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
> > > :CAP_SYS_RESOURCE was successfully removed for proper quota management
> > > in FSAL
> > > 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
> > > capabilities are: =
> > > cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
> > > credentials for principal nfs
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
> > > thread initialized
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
> > > IN GRACE, duration 60
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
> > > :Callback creds directory (/var/run/ganesha) already exists
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
> > > :gssd_refresh_krb5_machine_credential failed (2:2)
> > > 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
> > > delayed executor.
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
> > > dispatcher thread was started successfully
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
> > > dispatcher started
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
> > > :gsh_dbusthread was started successfully
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
> > > was started successfully
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
> > > was started successfully
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
> > > GRACE
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
> > > fridge was started successfully
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
> > > :-------------------------------------------------
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :             NFS
> > > SERVER INITIALIZED
> > > 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
> > > :-------------------------------------------------
> > > 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
> > > ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
> > > NOT IN GRACE
> > >
> > >
> > Please check the status of nfs-ganesha
> > $service nfs-ganesha status
> > 
> > Could you try taking a packet trace (during showmount or mount) and 
> > check the server responses.
> > 
> > Thanks,
> > Soumya
> > 
> > > Cheers,
> > >
> > > Alessandro
> > >
> > >
> > >> Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
> > >> <alessandro.desalvo@xxxxxxxxxxxxx
> > >> <mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
> > >>
> > >> Hi Soumya,
> > >>
> > >>> Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
> > >>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
> > >>>
> > >>>
> > >>>
> > >>> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
> > >>>> OK, I found at least one of the bugs.
> > >>>> The /usr/libexec/ganesha/ganesha.sh has the following lines:
> > >>>>
> > >>>>    if [ -e /etc/os-release ]; then
> > >>>>        RHEL6_PCS_CNAME_OPTION=""
> > >>>>    fi
> > >>>>
> > >>>> This is OK for RHEL < 7, but does not work for >= 7. I have changed
> > >>>> it to the following, to make it working:
> > >>>>
> > >>>>    if [ -e /etc/os-release ]; then
> > >>>>        eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
> > >>>>        [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
> > >>>> RHEL6_PCS_CNAME_OPTION=""
> > >>>>    fi
> > >>>>
> > >>> Oh..Thanks for the fix. Could you please file a bug for the same (and
> > >>> probably submit your fix as well). We shall have it corrected.
> > >>
> > >> Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
> > >>
> > >>>
> > >>>> Apart from that, the VIP_<node> I was using were wrong, and I should
> > >>>> have converted all the “-“ to underscores, maybe this could be
> > >>>> mentioned in the documentation when you will have it ready.
> > >>>> Now, the cluster starts, but the VIPs apparently not:
> > >>>>
> > >>> Sure. Thanks again for pointing it out. We shall make a note of it.
> > >>>
> > >>>> Online: [ atlas-node1 atlas-node2 ]
> > >>>>
> > >>>> Full list of resources:
> > >>>>
> > >>>> Clone Set: nfs-mon-clone [nfs-mon]
> > >>>>     Started: [ atlas-node1 atlas-node2 ]
> > >>>> Clone Set: nfs-grace-clone [nfs-grace]
> > >>>>     Started: [ atlas-node1 atlas-node2 ]
> > >>>> atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
> > >>>> atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
> > >>>> atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
> > >>>> atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
> > >>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
> > >>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
> > >>>>
> > >>>> PCSD Status:
> > >>>>  atlas-node1: Online
> > >>>>  atlas-node2: Online
> > >>>>
> > >>>> Daemon Status:
> > >>>>  corosync: active/disabled
> > >>>>  pacemaker: active/disabled
> > >>>>  pcsd: active/enabled
> > >>>>
> > >>>>
> > >>> Here corosync and pacemaker shows 'disabled' state. Can you check the
> > >>> status of their services. They should be running prior to cluster
> > >>> creation. We need to include that step in document as well.
> > >>
> > >> Ah, OK, you’re right, I have added it to my puppet modules (we install
> > >> and configure ganesha via puppet, I’ll put the module on puppetforge
> > >> soon, in case anyone is interested).
> > >>
> > >>>
> > >>>> But the issue that is puzzling me more is the following:
> > >>>>
> > >>>> # showmount -e localhost
> > >>>> rpc mount export: RPC: Timed out
> > >>>>
> > >>>> And when I try to enable the ganesha exports on a volume I get this
> > >>>> error:
> > >>>>
> > >>>> # gluster volume set atlas-home-01 ganesha.enable on
> > >>>> volume set: failed: Failed to create NFS-Ganesha export config file.
> > >>>>
> > >>>> But I see the file created in /etc/ganesha/exports/*.conf
> > >>>> Still, showmount hangs and times out.
> > >>>> Any help?
> > >>>> Thanks,
> > >>>>
> > >>> Hmm that's strange. Sometimes, in case if there was no proper cleanup
> > >>> done while trying to re-create the cluster, we have seen such issues.
> > >>>
> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
> > >>>
> > >>> http://review.gluster.org/#/c/11093/
> > >>>
> > >>> Can you please unexport all the volumes, teardown the cluster using
> > >>> 'gluster vol set <volname> ganesha.enable off’
> > >>
> > >> OK:
> > >>
> > >> # gluster vol set atlas-home-01 ganesha.enable off
> > >> volume set: failed: ganesha.enable is already 'off'.
> > >>
> > >> # gluster vol set atlas-data-01 ganesha.enable off
> > >> volume set: failed: ganesha.enable is already 'off'.
> > >>
> > >>
> > >>> 'gluster ganesha disable' command.
> > >>
> > >> I’m assuming you wanted to write nfs-ganesha instead?
> > >>
> > >> # gluster nfs-ganesha disable
> > >> ganesha enable : success
> > >>
> > >>
> > >> A side note (not really important): it’s strange that when I do a
> > >> disable the message is “ganesha enable” :-)
> > >>
> > >>>
> > >>> Verify if the following files have been deleted on all the nodes-
> > >>> '/etc/cluster/cluster.conf’
> > >>
> > >> this file is not present at all, I think it’s not needed in CentOS 7
> > >>
> > >>> '/etc/ganesha/ganesha.conf’,
> > >>
> > >> it’s still there, but empty, and I guess it should be OK, right?
> > >>
> > >>> '/etc/ganesha/exports/*’
> > >>
> > >> no more files there
> > >>
> > >>> '/var/lib/pacemaker/cib’
> > >>
> > >> it’s empty
> > >>
> > >>>
> > >>> Verify if the ganesha service is stopped on all the nodes.
> > >>
> > >> nope, it’s still running, I will stop it.
> > >>
> > >>>
> > >>> start/restart the services - corosync, pcs.
> > >>
> > >> In the node where I issued the nfs-ganesha disable there is no more
> > >> any /etc/corosync/corosync.conf so corosync won’t start. The other
> > >> node instead still has the file, it’s strange.
> > >>
> > >>>
> > >>> And re-try the HA cluster creation
> > >>> 'gluster ganesha enable’
> > >>
> > >> This time (repeated twice) it did not work at all:
> > >>
> > >> # pcs status
> > >> Cluster name: ATLAS_GANESHA_01
> > >> Last updated: Tue Jun  9 10:13:43 2015
> > >> Last change: Tue Jun  9 10:13:22 2015
> > >> Stack: corosync
> > >> Current DC: atlas-node1 (1) - partition with quorum
> > >> Version: 1.1.12-a14efad
> > >> 2 Nodes configured
> > >> 6 Resources configured
> > >>
> > >>
> > >> Online: [ atlas-node1 atlas-node2 ]
> > >>
> > >> Full list of resources:
> > >>
> > >> Clone Set: nfs-mon-clone [nfs-mon]
> > >>     Started: [ atlas-node1 atlas-node2 ]
> > >> Clone Set: nfs-grace-clone [nfs-grace]
> > >>     Started: [ atlas-node1 atlas-node2 ]
> > >> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
> > >> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
> > >>
> > >> PCSD Status:
> > >>  atlas-node1: Online
> > >>  atlas-node2: Online
> > >>
> > >> Daemon Status:
> > >>  corosync: active/enabled
> > >>  pacemaker: active/enabled
> > >>  pcsd: active/enabled
> > >>
> > >>
> > >>
> > >> I tried then "pcs cluster destroy" on both nodes, and then again
> > >> nfs-ganesha enable, but now I’m back to the old problem:
> > >>
> > >> # pcs status
> > >> Cluster name: ATLAS_GANESHA_01
> > >> Last updated: Tue Jun  9 10:22:27 2015
> > >> Last change: Tue Jun  9 10:17:00 2015
> > >> Stack: corosync
> > >> Current DC: atlas-node2 (2) - partition with quorum
> > >> Version: 1.1.12-a14efad
> > >> 2 Nodes configured
> > >> 10 Resources configured
> > >>
> > >>
> > >> Online: [ atlas-node1 atlas-node2 ]
> > >>
> > >> Full list of resources:
> > >>
> > >> Clone Set: nfs-mon-clone [nfs-mon]
> > >>     Started: [ atlas-node1 atlas-node2 ]
> > >> Clone Set: nfs-grace-clone [nfs-grace]
> > >>     Started: [ atlas-node1 atlas-node2 ]
> > >> atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
> > >> atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1
> > >> atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
> > >> atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2
> > >> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
> > >> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
> > >>
> > >> PCSD Status:
> > >>  atlas-node1: Online
> > >>  atlas-node2: Online
> > >>
> > >> Daemon Status:
> > >>  corosync: active/enabled
> > >>  pacemaker: active/enabled
> > >>  pcsd: active/enabled
> > >>
> > >>
> > >> Cheers,
> > >>
> > >> Alessandro
> > >>
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Soumya
> > >>>
> > >>>> Alessandro
> > >>>>
> > >>>>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
> > >>>>> <Alessandro.DeSalvo@xxxxxxxxxxxxx
> > >>>>> <mailto:Alessandro.DeSalvo@xxxxxxxxxxxxx>> ha scritto:
> > >>>>>
> > >>>>> Hi,
> > >>>>> indeed, it does not work :-)
> > >>>>> OK, this is what I did, with 2 machines, running CentOS 7.1,
> > >>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
> > >>>>>
> > >>>>> 1) ensured that the machines are able to resolve their IPs (but
> > >>>>> this was already true since they were in the DNS);
> > >>>>> 2) disabled NetworkManager and enabled network on both machines;
> > >>>>> 3) created a gluster shared volume 'gluster_shared_storage' and
> > >>>>> mounted it on '/run/gluster/shared_storage' on all the cluster
> > >>>>> nodes using glusterfs native mount (on CentOS 7.1 there is a link
> > >>>>> by default /var/run -> ../run)
> > >>>>> 4) created an empty /etc/ganesha/ganesha.conf;
> > >>>>> 5) installed pacemaker pcs resource-agents corosync on all cluster
> > >>>>> machines;
> > >>>>> 6) set the ‘hacluster’ user the same password on all machines;
> > >>>>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
> > >>>>> nodes (on both nodes I issued the commands for both nodes)
> > >>>>> 8) IPv6 is configured by default on all nodes, although the
> > >>>>> infrastructure is not ready for IPv6
> > >>>>> 9) enabled pcsd and started it on all nodes
> > >>>>> 10) populated /etc/ganesha/ganesha-ha.conf with the following
> > >>>>> contents, one per machine:
> > >>>>>
> > >>>>>
> > >>>>> ===> atlas-node1
> > >>>>> # Name of the HA cluster created.
> > >>>>> HA_NAME="ATLAS_GANESHA_01"
> > >>>>> # The server from which you intend to mount
> > >>>>> # the shared volume.
> > >>>>> HA_VOL_SERVER=“atlas-node1"
> > >>>>> # The subset of nodes of the Gluster Trusted Pool
> > >>>>> # that forms the ganesha HA cluster. IP/Hostname
> > >>>>> # is specified.
> > >>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
> > >>>>> # Virtual IPs of each of the nodes specified above.
> > >>>>> VIP_atlas-node1=“x.x.x.1"
> > >>>>> VIP_atlas-node2=“x.x.x.2"
> > >>>>>
> > >>>>> ===> atlas-node2
> > >>>>> # Name of the HA cluster created.
> > >>>>> HA_NAME="ATLAS_GANESHA_01"
> > >>>>> # The server from which you intend to mount
> > >>>>> # the shared volume.
> > >>>>> HA_VOL_SERVER=“atlas-node2"
> > >>>>> # The subset of nodes of the Gluster Trusted Pool
> > >>>>> # that forms the ganesha HA cluster. IP/Hostname
> > >>>>> # is specified.
> > >>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
> > >>>>> # Virtual IPs of each of the nodes specified above.
> > >>>>> VIP_atlas-node1=“x.x.x.1"
> > >>>>> VIP_atlas-node2=“x.x.x.2”
> > >>>>>
> > >>>>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic
> > >>>>> message:
> > >>>>>
> > >>>>> # gluster nfs-ganesha enable
> > >>>>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
> > >>>>> trusted pool. Do you still want to continue? (y/n) y
> > >>>>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
> > >>>>> Please check the log file for details
> > >>>>>
> > >>>>> Looking at the logs I found nothing really special but this:
> > >>>>>
> > >>>>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
> > >>>>> [2015-06-08 17:57:15.672844] I [MSGID: 106132]
> > >>>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
> > >>>>> already stopped
> > >>>>> [2015-06-08 17:57:15.675395] I
> > >>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
> > >>>>> found Hostname is atlas-node2
> > >>>>> [2015-06-08 17:57:15.720692] I
> > >>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
> > >>>>> found Hostname is atlas-node2
> > >>>>> [2015-06-08 17:57:15.721161] I
> > >>>>> [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
> > >>>>> found Hostname is atlas-node2
> > >>>>> [2015-06-08 17:57:16.633048] E
> > >>>>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
> > >>>>> Initial NFS-Ganesha set up failed
> > >>>>> [2015-06-08 17:57:16.641563] E
> > >>>>> [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
> > >>>>> operation 'Volume (null)' failed on localhost : Failed to set up HA
> > >>>>> config for NFS-Ganesha. Please check the log file for details
> > >>>>>
> > >>>>> ==> /var/log/glusterfs/cmd_history.log <==
> > >>>>> [2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
> > >>>>> Failed to set up HA config for NFS-Ganesha. Please check the log
> > >>>>> file for details
> > >>>>>
> > >>>>> ==> /var/log/glusterfs/cli.log <==
> > >>>>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
> > >>>>> with: -1
> > >>>>>
> > >>>>>
> > >>>>> Also, pcs seems to be fine for the auth part, although it obviously
> > >>>>> tells me the cluster is not running.
> > >>>>>
> > >>>>> I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running:
> > >>>>> /usr/sbin/corosync-cmapctl totem.cluster_name
> > >>>>> I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running:
> > >>>>> /usr/sbin/pcs cluster token-nodes
> > >>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
> > >>>>> /remote/check_auth HTTP/1.1" 200 68 0.1919
> > >>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
> > >>>>> /remote/check_auth HTTP/1.1" 200 68 0.1920
> > >>>>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
> > >>>>> /remote/check_auth HTTP/1.1" 200 68
> > >>>>> - -> /remote/check_auth
> > >>>>>
> > >>>>>
> > >>>>> What am I doing wrong?
> > >>>>> Thanks,
> > >>>>>
> > >>>>> Alessandro
> > >>>>>
> > >>>>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
> > >>>>>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
> > >>>>>>> Sorry, just another question:
> > >>>>>>>
> > >>>>>>> - in my installation of gluster 3.7.1 the command gluster
> > >>>>>>> features.ganesha enable does not work:
> > >>>>>>>
> > >>>>>>> # gluster features.ganesha enable
> > >>>>>>> unrecognized word: features.ganesha (position 0)
> > >>>>>>>
> > >>>>>>> Which version has full support for it?
> > >>>>>>
> > >>>>>> Sorry. This option has recently been changed. It is now
> > >>>>>>
> > >>>>>> $ gluster nfs-ganesha enable
> > >>>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>> - in the documentation the ccs and cman packages are required,
> > >>>>>>> but they seems not to be available anymore on CentOS 7 and
> > >>>>>>> similar, I guess they are not really required anymore, as pcs
> > >>>>>>> should do the full job
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>>
> > >>>>>>> Alessandro
> > >>>>>>
> > >>>>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html.
> > >>>>>> Let us know if it doesn't work.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Soumya
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
> > >>>>>>>> <alessandro.desalvo@xxxxxxxxxxxxx
> > >>>>>>>> <mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
> > >>>>>>>>
> > >>>>>>>> Great, many thanks Soumya!
> > >>>>>>>> Cheers,
> > >>>>>>>>
> > >>>>>>>> Alessandro
> > >>>>>>>>
> > >>>>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
> > >>>>>>>>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
> > >>>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> Please find the slides of the demo video at [1]
> > >>>>>>>>>
> > >>>>>>>>> We recommend to have a distributed replica volume as a shared
> > >>>>>>>>> volume for better data-availability.
> > >>>>>>>>>
> > >>>>>>>>> Size of the volume depends on the workload you may have. Since
> > >>>>>>>>> it is used to maintain states of NLM/NFSv4 clients, you may
> > >>>>>>>>> calculate the size of the volume to be minimum of aggregate of
> > >>>>>>>>> (typical_size_of'/var/lib/nfs'_directory +
> > >>>>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
> > >>>>>>>>>
> > >>>>>>>>> We shall document about this feature sooner in the gluster docs
> > >>>>>>>>> as well.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Soumya
> > >>>>>>>>>
> > >>>>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
> > >>>>>>>>>
> > >>>>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>> I have seen the demo video on ganesha HA,
> > >>>>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM
> > >>>>>>>>>> However there is no advice on the appropriate size of the
> > >>>>>>>>>> shared volume. How is it really used, and what should be a
> > >>>>>>>>>> reasonable size for it?
> > >>>>>>>>>> Also, are the slides from the video available somewhere, as
> > >>>>>>>>>> well as a documentation on all this? I did not manage to find
> > >>>>>>>>>> them.
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>>
> > >>>>>>>>>> Alessandro
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Gluster-users mailing list
> > >>>>>>>>>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
> > >>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Gluster-users mailing list
> > >>>>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
> > >>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> > >>>>
> > >>
> > >> _______________________________________________
> > >> Gluster-users mailing list
> > >> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
> > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users