Re: Questions on ganesha HA and shared storage size

Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> · Tue, 9 Jun 2015 12:27:41 +0200

Hi,

> Il giorno 09/giu/2015, alle ore 11:46, Soumya Koduri <skoduri@xxxxxxxxxx> ha scritto:
> 
> 
> 
> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>> Hi,
>> OK, the problem with the VIPs not starting is due to the ganesha_mon
>> heartbeat script looking for a pid file called
>> /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
>> creating /var/run/ganesha.pid, this needs to be corrected. The file is
>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
>> For the moment I have created a symlink in this way and it works:
>> 
>> ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
>> 
> Thanks. Please update this as well in the bug.

Done :-)

> 
>> So far so good, the VIPs are up and pingable, but still there is the
>> problem of the hanging showmount (i.e. hanging RPC).
>> Still, I see a lot of errors like this in /var/log/messages:
>> 
>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
>> nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]
>> 
>> While ganesha.log shows the server is not in grace:
>> 
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
>> Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>> May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
>> <http://buildhw-09.phx2.fedoraproject.org>
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
>> :Configuration file successfully parsed
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
>> :Initializing ID Mapper.
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
>> successfully initialized.
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
>> found in configuration file !!!
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
>> ((null):0): Empty configuration file
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
>> :CAP_SYS_RESOURCE was successfully removed for proper quota management
>> in FSAL
>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
>> capabilities are: =
>> cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
>> credentials for principal nfs
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
>> thread initialized
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
>> IN GRACE, duration 60
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
>> :Callback creds directory (/var/run/ganesha) already exists
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
>> :gssd_refresh_krb5_machine_credential failed (2:2)
>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
>> delayed executor.
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
>> dispatcher thread was started successfully
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
>> dispatcher started
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
>> :gsh_dbusthread was started successfully
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
>> was started successfully
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
>> was started successfully
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
>> GRACE
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
>> fridge was started successfully
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>> :-------------------------------------------------
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :             NFS
>> SERVER INITIALIZED
>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>> :-------------------------------------------------
>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
>> NOT IN GRACE
>> 
>> 
> Please check the status of nfs-ganesha
> $service nfs-ganesha status

It’s fine:

# service nfs-ganesha status
Redirecting to /bin/systemctl status  nfs-ganesha.service
nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled)
   Active: active (running) since Tue 2015-06-09 11:54:39 CEST; 32min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 28081 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 28425 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 28423 ExecStart=/usr/bin/ganesha.nfsd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 28424 (ganesha.nfsd)
   CGroup: /system.slice/nfs-ganesha.service
           ââ28424 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid

> 
> Could you try taking a packet trace (during showmount or mount) and check the server responses.

The problem is that the portmapper seems to be working but the nothing happens:

3785   0.652843 x.x.x.2 -> x.x.x.1 Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 TCP
3788   0.653339 x.x.x.1 -> x.x.x.2 Portmap 70 V2 GETPORT Reply (Call In 3785) Port:33645
3789   0.653756 x.x.x.2 -> x.x.x.1 TCP 74 50774 > 33645 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=73312128 TSecr=0 WS=128
3790   0.653784 x.x.x.1 -> x.x.x.2 TCP 74 33645 > 50774 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=132248576 TSecr=73312128 WS=128
3791   0.654004 x.x.x.2 -> x.x.x.1 TCP 66 50774 > 33645 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=73312128 TSecr=132248576
3793   0.654174 x.x.x.2 -> x.x.x.1 MOUNT 158 V3 EXPORT Call
3794   0.654184 x.x.x.1 -> x.x.x.2 TCP 66 33645 > 50774 [ACK] Seq=1 Ack=93 Win=14592 Len=0 TSval=132248576 TSecr=73312129
86065  20.674219 x.x.x.2 -> x.x.x.1 TCP 66 50774 > 33645 [FIN, ACK] Seq=93 Ack=1 Win=29312 Len=0 TSval=73332149 TSecr=132248576
86247  20.713745 x.x.x.1 -> x.x.x.2 TCP 66 33645 > 50774 [ACK] Seq=1 Ack=94 Win=14592 Len=0 TSval=132268636 TSecr=73332149

Cheers,

	Alessandro

> 
> Thanks,
> Soumya
> 
>> Cheers,
>> 
>> Alessandro
>> 
>> 
>>> Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
>>> <alessandro.desalvo@xxxxxxxxxxxxx
>>> <mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
>>> 
>>> Hi Soumya,
>>> 
>>>> Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
>>>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
>>>> 
>>>> 
>>>> 
>>>> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
>>>>> OK, I found at least one of the bugs.
>>>>> The /usr/libexec/ganesha/ganesha.sh has the following lines:
>>>>> 
>>>>>   if [ -e /etc/os-release ]; then
>>>>>       RHEL6_PCS_CNAME_OPTION=""
>>>>>   fi
>>>>> 
>>>>> This is OK for RHEL < 7, but does not work for >= 7. I have changed
>>>>> it to the following, to make it working:
>>>>> 
>>>>>   if [ -e /etc/os-release ]; then
>>>>>       eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>       [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>   fi
>>>>> 
>>>> Oh..Thanks for the fix. Could you please file a bug for the same (and
>>>> probably submit your fix as well). We shall have it corrected.
>>> 
>>> Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>> 
>>>> 
>>>>> Apart from that, the VIP_<node> I was using were wrong, and I should
>>>>> have converted all the “-“ to underscores, maybe this could be
>>>>> mentioned in the documentation when you will have it ready.
>>>>> Now, the cluster starts, but the VIPs apparently not:
>>>>> 
>>>> Sure. Thanks again for pointing it out. We shall make a note of it.
>>>> 
>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>> 
>>>>> Full list of resources:
>>>>> 
>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>    Started: [ atlas-node1 atlas-node2 ]
>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>    Started: [ atlas-node1 atlas-node2 ]
>>>>> atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>>>> atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
>>>>> atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>>>> atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
>>>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
>>>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
>>>>> 
>>>>> PCSD Status:
>>>>> atlas-node1: Online
>>>>> atlas-node2: Online
>>>>> 
>>>>> Daemon Status:
>>>>> corosync: active/disabled
>>>>> pacemaker: active/disabled
>>>>> pcsd: active/enabled
>>>>> 
>>>>> 
>>>> Here corosync and pacemaker shows 'disabled' state. Can you check the
>>>> status of their services. They should be running prior to cluster
>>>> creation. We need to include that step in document as well.
>>> 
>>> Ah, OK, you’re right, I have added it to my puppet modules (we install
>>> and configure ganesha via puppet, I’ll put the module on puppetforge
>>> soon, in case anyone is interested).
>>> 
>>>> 
>>>>> But the issue that is puzzling me more is the following:
>>>>> 
>>>>> # showmount -e localhost
>>>>> rpc mount export: RPC: Timed out
>>>>> 
>>>>> And when I try to enable the ganesha exports on a volume I get this
>>>>> error:
>>>>> 
>>>>> # gluster volume set atlas-home-01 ganesha.enable on
>>>>> volume set: failed: Failed to create NFS-Ganesha export config file.
>>>>> 
>>>>> But I see the file created in /etc/ganesha/exports/*.conf
>>>>> Still, showmount hangs and times out.
>>>>> Any help?
>>>>> Thanks,
>>>>> 
>>>> Hmm that's strange. Sometimes, in case if there was no proper cleanup
>>>> done while trying to re-create the cluster, we have seen such issues.
>>>> 
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>> 
>>>> http://review.gluster.org/#/c/11093/
>>>> 
>>>> Can you please unexport all the volumes, teardown the cluster using
>>>> 'gluster vol set <volname> ganesha.enable off’
>>> 
>>> OK:
>>> 
>>> # gluster vol set atlas-home-01 ganesha.enable off
>>> volume set: failed: ganesha.enable is already 'off'.
>>> 
>>> # gluster vol set atlas-data-01 ganesha.enable off
>>> volume set: failed: ganesha.enable is already 'off'.
>>> 
>>> 
>>>> 'gluster ganesha disable' command.
>>> 
>>> I’m assuming you wanted to write nfs-ganesha instead?
>>> 
>>> # gluster nfs-ganesha disable
>>> ganesha enable : success
>>> 
>>> 
>>> A side note (not really important): it’s strange that when I do a
>>> disable the message is “ganesha enable” :-)
>>> 
>>>> 
>>>> Verify if the following files have been deleted on all the nodes-
>>>> '/etc/cluster/cluster.conf’
>>> 
>>> this file is not present at all, I think it’s not needed in CentOS 7
>>> 
>>>> '/etc/ganesha/ganesha.conf’,
>>> 
>>> it’s still there, but empty, and I guess it should be OK, right?
>>> 
>>>> '/etc/ganesha/exports/*’
>>> 
>>> no more files there
>>> 
>>>> '/var/lib/pacemaker/cib’
>>> 
>>> it’s empty
>>> 
>>>> 
>>>> Verify if the ganesha service is stopped on all the nodes.
>>> 
>>> nope, it’s still running, I will stop it.
>>> 
>>>> 
>>>> start/restart the services - corosync, pcs.
>>> 
>>> In the node where I issued the nfs-ganesha disable there is no more
>>> any /etc/corosync/corosync.conf so corosync won’t start. The other
>>> node instead still has the file, it’s strange.
>>> 
>>>> 
>>>> And re-try the HA cluster creation
>>>> 'gluster ganesha enable’
>>> 
>>> This time (repeated twice) it did not work at all:
>>> 
>>> # pcs status
>>> Cluster name: ATLAS_GANESHA_01
>>> Last updated: Tue Jun  9 10:13:43 2015
>>> Last change: Tue Jun  9 10:13:22 2015
>>> Stack: corosync
>>> Current DC: atlas-node1 (1) - partition with quorum
>>> Version: 1.1.12-a14efad
>>> 2 Nodes configured
>>> 6 Resources configured
>>> 
>>> 
>>> Online: [ atlas-node1 atlas-node2 ]
>>> 
>>> Full list of resources:
>>> 
>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>    Started: [ atlas-node1 atlas-node2 ]
>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>    Started: [ atlas-node1 atlas-node2 ]
>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
>>> 
>>> PCSD Status:
>>> atlas-node1: Online
>>> atlas-node2: Online
>>> 
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> 
>>> 
>>> 
>>> I tried then "pcs cluster destroy" on both nodes, and then again
>>> nfs-ganesha enable, but now I’m back to the old problem:
>>> 
>>> # pcs status
>>> Cluster name: ATLAS_GANESHA_01
>>> Last updated: Tue Jun  9 10:22:27 2015
>>> Last change: Tue Jun  9 10:17:00 2015
>>> Stack: corosync
>>> Current DC: atlas-node2 (2) - partition with quorum
>>> Version: 1.1.12-a14efad
>>> 2 Nodes configured
>>> 10 Resources configured
>>> 
>>> 
>>> Online: [ atlas-node1 atlas-node2 ]
>>> 
>>> Full list of resources:
>>> 
>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>    Started: [ atlas-node1 atlas-node2 ]
>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>    Started: [ atlas-node1 atlas-node2 ]
>>> atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
>>> atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1
>>> atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
>>> atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2
>>> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
>>> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
>>> 
>>> PCSD Status:
>>> atlas-node1: Online
>>> atlas-node2: Online
>>> 
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>> 
>>> 
>>> Cheers,
>>> 
>>> Alessandro
>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Soumya
>>>> 
>>>>> Alessandro
>>>>> 
>>>>>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
>>>>>> <Alessandro.DeSalvo@xxxxxxxxxxxxx
>>>>>> <mailto:Alessandro.DeSalvo@xxxxxxxxxxxxx>> ha scritto:
>>>>>> 
>>>>>> Hi,
>>>>>> indeed, it does not work :-)
>>>>>> OK, this is what I did, with 2 machines, running CentOS 7.1,
>>>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
>>>>>> 
>>>>>> 1) ensured that the machines are able to resolve their IPs (but
>>>>>> this was already true since they were in the DNS);
>>>>>> 2) disabled NetworkManager and enabled network on both machines;
>>>>>> 3) created a gluster shared volume 'gluster_shared_storage' and
>>>>>> mounted it on '/run/gluster/shared_storage' on all the cluster
>>>>>> nodes using glusterfs native mount (on CentOS 7.1 there is a link
>>>>>> by default /var/run -> ../run)
>>>>>> 4) created an empty /etc/ganesha/ganesha.conf;
>>>>>> 5) installed pacemaker pcs resource-agents corosync on all cluster
>>>>>> machines;
>>>>>> 6) set the ‘hacluster’ user the same password on all machines;
>>>>>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
>>>>>> nodes (on both nodes I issued the commands for both nodes)
>>>>>> 8) IPv6 is configured by default on all nodes, although the
>>>>>> infrastructure is not ready for IPv6
>>>>>> 9) enabled pcsd and started it on all nodes
>>>>>> 10) populated /etc/ganesha/ganesha-ha.conf with the following
>>>>>> contents, one per machine:
>>>>>> 
>>>>>> 
>>>>>> ===> atlas-node1
>>>>>> # Name of the HA cluster created.
>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>> # The server from which you intend to mount
>>>>>> # the shared volume.
>>>>>> HA_VOL_SERVER=“atlas-node1"
>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>> # is specified.
>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>> VIP_atlas-node2=“x.x.x.2"
>>>>>> 
>>>>>> ===> atlas-node2
>>>>>> # Name of the HA cluster created.
>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>> # The server from which you intend to mount
>>>>>> # the shared volume.
>>>>>> HA_VOL_SERVER=“atlas-node2"
>>>>>> # The subset of nodes of the Gluster Trusted Pool
>>>>>> # that forms the ganesha HA cluster. IP/Hostname
>>>>>> # is specified.
>>>>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>>>>> # Virtual IPs of each of the nodes specified above.
>>>>>> VIP_atlas-node1=“x.x.x.1"
>>>>>> VIP_atlas-node2=“x.x.x.2”
>>>>>> 
>>>>>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic
>>>>>> message:
>>>>>> 
>>>>>> # gluster nfs-ganesha enable
>>>>>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
>>>>>> trusted pool. Do you still want to continue? (y/n) y
>>>>>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
>>>>>> Please check the log file for details
>>>>>> 
>>>>>> Looking at the logs I found nothing really special but this:
>>>>>> 
>>>>>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
>>>>>> [2015-06-08 17:57:15.672844] I [MSGID: 106132]
>>>>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>> already stopped
>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>> found Hostname is atlas-node2
>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>> [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>> found Hostname is atlas-node2
>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>> [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>> found Hostname is atlas-node2
>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>> Initial NFS-Ganesha set up failed
>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>> [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>> operation 'Volume (null)' failed on localhost : Failed to set up HA
>>>>>> config for NFS-Ganesha. Please check the log file for details
>>>>>> 
>>>>>> ==> /var/log/glusterfs/cmd_history.log <==
>>>>>> [2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>> Failed to set up HA config for NFS-Ganesha. Please check the log
>>>>>> file for details
>>>>>> 
>>>>>> ==> /var/log/glusterfs/cli.log <==
>>>>>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>> with: -1
>>>>>> 
>>>>>> 
>>>>>> Also, pcs seems to be fine for the auth part, although it obviously
>>>>>> tells me the cluster is not running.
>>>>>> 
>>>>>> I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running:
>>>>>> /usr/sbin/corosync-cmapctl totem.cluster_name
>>>>>> I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running:
>>>>>> /usr/sbin/pcs cluster token-nodes
>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1919
>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
>>>>>> /remote/check_auth HTTP/1.1" 200 68 0.1920
>>>>>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
>>>>>> /remote/check_auth HTTP/1.1" 200 68
>>>>>> - -> /remote/check_auth
>>>>>> 
>>>>>> 
>>>>>> What am I doing wrong?
>>>>>> Thanks,
>>>>>> 
>>>>>> Alessandro
>>>>>> 
>>>>>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
>>>>>>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
>>>>>>>> Sorry, just another question:
>>>>>>>> 
>>>>>>>> - in my installation of gluster 3.7.1 the command gluster
>>>>>>>> features.ganesha enable does not work:
>>>>>>>> 
>>>>>>>> # gluster features.ganesha enable
>>>>>>>> unrecognized word: features.ganesha (position 0)
>>>>>>>> 
>>>>>>>> Which version has full support for it?
>>>>>>> 
>>>>>>> Sorry. This option has recently been changed. It is now
>>>>>>> 
>>>>>>> $ gluster nfs-ganesha enable
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> - in the documentation the ccs and cman packages are required,
>>>>>>>> but they seems not to be available anymore on CentOS 7 and
>>>>>>>> similar, I guess they are not really required anymore, as pcs
>>>>>>>> should do the full job
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Alessandro
>>>>>>> 
>>>>>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html.
>>>>>>> Let us know if it doesn't work.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>> 
>>>>>>>> 
>>>>>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>> <alessandro.desalvo@xxxxxxxxxxxxx
>>>>>>>>> <mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
>>>>>>>>> 
>>>>>>>>> Great, many thanks Soumya!
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Alessandro
>>>>>>>>> 
>>>>>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>> <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Please find the slides of the demo video at [1]
>>>>>>>>>> 
>>>>>>>>>> We recommend to have a distributed replica volume as a shared
>>>>>>>>>> volume for better data-availability.
>>>>>>>>>> 
>>>>>>>>>> Size of the volume depends on the workload you may have. Since
>>>>>>>>>> it is used to maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>> calculate the size of the volume to be minimum of aggregate of
>>>>>>>>>> (typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>> 
>>>>>>>>>> We shall document about this feature sooner in the gluster docs
>>>>>>>>>> as well.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>> 
>>>>>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>> 
>>>>>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> I have seen the demo video on ganesha HA,
>>>>>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>> However there is no advice on the appropriate size of the
>>>>>>>>>>> shared volume. How is it really used, and what should be a
>>>>>>>>>>> reasonable size for it?
>>>>>>>>>>> Also, are the slides from the video available somewhere, as
>>>>>>>>>>> well as a documentation on all this? I did not manage to find
>>>>>>>>>>> them.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Alessandro
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> 

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users