Re: Questions on ganesha HA and shared storage size

Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> · Mon, 8 Jun 2015 22:01:54 +0200

OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

    if [ -e /etc/os-release ]; then
        RHEL6_PCS_CNAME_OPTION=""
    fi

This is OK for RHEL < 7, but does not work for >= 7. I have changed it to the following, to make it working:

    if [ -e /etc/os-release ]; then
        eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
        [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] && RHEL6_PCS_CNAME_OPTION=""
    fi

Apart from that, the VIP_<node> I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
 atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped 
 atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped 
 atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2 
 atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2 

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,

	Alessandro

> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> ha scritto:
> 
> Hi,
> indeed, it does not work :-)
> OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
> 
> 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS);
> 2) disabled NetworkManager and enabled network on both machines;
> 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run -> ../run)
> 4) created an empty /etc/ganesha/ganesha.conf;
> 5) installed pacemaker pcs resource-agents corosync on all cluster machines;
> 6) set the ‘hacluster’ user the same password on all machines;
> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the nodes (on both nodes I issued the commands for both nodes)
> 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6
> 9) enabled pcsd and started it on all nodes
> 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine:
> 
> 
> ===> atlas-node1
> # Name of the HA cluster created.
> HA_NAME="ATLAS_GANESHA_01"
> # The server from which you intend to mount
> # the shared volume.
> HA_VOL_SERVER=“atlas-node1"
> # The subset of nodes of the Gluster Trusted Pool
> # that forms the ganesha HA cluster. IP/Hostname
> # is specified.
> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
> # Virtual IPs of each of the nodes specified above.
> VIP_atlas-node1=“x.x.x.1"
> VIP_atlas-node2=“x.x.x.2"
> 
> ===> atlas-node2
> # Name of the HA cluster created.
> HA_NAME="ATLAS_GANESHA_01"
> # The server from which you intend to mount
> # the shared volume.
> HA_VOL_SERVER=“atlas-node2"
> # The subset of nodes of the Gluster Trusted Pool
> # that forms the ganesha HA cluster. IP/Hostname
> # is specified.
> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
> # Virtual IPs of each of the nodes specified above.
> VIP_atlas-node1=“x.x.x.1"
> VIP_atlas-node2=“x.x.x.2”
> 
> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message:
> 
> # gluster nfs-ganesha enable
> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y
> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check the log file for details
> 
> Looking at the logs I found nothing really special but this:
> 
> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
> [2015-06-08 17:57:15.672844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
> [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2
> [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2
> [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host found Hostname is atlas-node2
> [2015-06-08 17:57:16.633048] E [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management: Initial NFS-Ganesha set up failed
> [2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of operation 'Volume (null)' failed on localhost : Failed to set up HA config for NFS-Ganesha. Please check the log file for details
> 
> ==> /var/log/glusterfs/cmd_history.log <==
> [2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED : Failed to set up HA config for NFS-Ganesha. Please check the log file for details
> 
> ==> /var/log/glusterfs/cli.log <==
> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1
> 
> 
> Also, pcs seems to be fine for the auth part, although it obviously tells me the cluster is not running.
> 
> I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name
> I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running: /usr/sbin/pcs cluster token-nodes
> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth HTTP/1.1" 200 68 0.1919
> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth HTTP/1.1" 200 68 0.1920
> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET /remote/check_auth HTTP/1.1" 200 68
> - -> /remote/check_auth
> 
> 
> What am I doing wrong?
> Thanks,
> 
> 	Alessandro
> 
>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri <skoduri@xxxxxxxxxx> ha scritto:
>> 
>> 
>> 
>> 
>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
>>> Sorry, just another question:
>>> 
>>> - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work:
>>> 
>>> # gluster features.ganesha enable
>>> unrecognized word: features.ganesha (position 0)
>>> 
>>> Which version has full support for it?
>> 
>> Sorry. This option has recently been changed. It is now
>> 
>> $ gluster nfs-ganesha enable
>> 
>> 
>>> 
>>> - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job
>>> 
>>> Thanks,
>>> 
>>> 	Alessandro
>> 
>> Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us know if it doesn't work.
>> 
>> Thanks,
>> Soumya
>> 
>>> 
>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo <alessandro.desalvo@xxxxxxxxxxxxx> ha scritto:
>>>> 
>>>> Great, many thanks Soumya!
>>>> Cheers,
>>>> 
>>>> 	Alessandro
>>>> 
>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri <skoduri@xxxxxxxxxx> ha scritto:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Please find the slides of the demo video at [1]
>>>>> 
>>>>> We recommend to have a distributed replica volume as a shared volume for better data-availability.
>>>>> 
>>>>> Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of
>>>>> (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>> 
>>>>> We shall document about this feature sooner in the gluster docs as well.
>>>>> 
>>>>> Thanks,
>>>>> Soumya
>>>>> 
>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>> 
>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
>>>>>> Hi,
>>>>>> I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>> However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it?
>>>>>> Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them.
>>>>>> Thanks,
>>>>>> 
>>>>>> 	Alessandro
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@xxxxxxxxxxx
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>> 
>>>> 
>>> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users