Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
Hi Soumya,
Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:
if [ -e /etc/os-release ]; then
RHEL6_PCS_CNAME_OPTION=""
fi
This is OK for RHEL < 7, but does not work for >= 7. I have changed
it to the following, to make it working:
if [ -e /etc/os-release ]; then
eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
[ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
RHEL6_PCS_CNAME_OPTION=""
fi
Oh..Thanks for the fix. Could you please file a bug for the same (and
probably submit your fix as well). We shall have it corrected.
Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
Apart from that, the VIP_<node> I was using were wrong, and I should
have converted all the “-“ to underscores, maybe this could be
mentioned in the documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:
Sure. Thanks again for pointing it out. We shall make a note of it.
Online: [ atlas-node1 atlas-node2 ]
Full list of resources:
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
PCSD Status:
atlas-node1: Online
atlas-node2: Online
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Here corosync and pacemaker shows 'disabled' state. Can you check the
status of their services. They should be running prior to cluster
creation. We need to include that step in document as well.
Ah, OK, you’re right, I have added it to my puppet modules (we install
and configure ganesha via puppet, I’ll put the module on puppetforge
soon, in case anyone is interested).
But the issue that is puzzling me more is the following:
# showmount -e localhost
rpc mount export: RPC: Timed out
And when I try to enable the ganesha exports on a volume I get this
error:
# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.
But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,
Hmm that's strange. Sometimes, in case if there was no proper cleanup
done while trying to re-create the cluster, we have seen such issues.
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
http://review.gluster.org/#/c/11093/
Can you please unexport all the volumes, teardown the cluster using
'gluster vol set <volname> ganesha.enable off’
OK:
# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.
# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.
'gluster ganesha disable' command.
I’m assuming you wanted to write nfs-ganesha instead?
# gluster nfs-ganesha disable
ganesha enable : success
A side note (not really important): it’s strange that when I do a
disable the message is “ganesha enable” :-)
Verify if the following files have been deleted on all the nodes-
'/etc/cluster/cluster.conf’
this file is not present at all, I think it’s not needed in CentOS 7
'/etc/ganesha/ganesha.conf’,
it’s still there, but empty, and I guess it should be OK, right?
'/etc/ganesha/exports/*’
no more files there
'/var/lib/pacemaker/cib’
it’s empty
Verify if the ganesha service is stopped on all the nodes.
nope, it’s still running, I will stop it.
start/restart the services - corosync, pcs.
In the node where I issued the nfs-ganesha disable there is no more
any /etc/corosync/corosync.conf so corosync won’t start. The other
node instead still has the file, it’s strange.
And re-try the HA cluster creation
'gluster ganesha enable’
This time (repeated twice) it did not work at all:
# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun 9 10:13:43 2015
Last change: Tue Jun 9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured
Online: [ atlas-node1 atlas-node2 ]
Full list of resources:
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ atlas-node1 atlas-node2 ]
atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
PCSD Status:
atlas-node1: Online
atlas-node2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
I tried then "pcs cluster destroy" on both nodes, and then again
nfs-ganesha enable, but now I’m back to the old problem:
# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun 9 10:22:27 2015
Last change: Tue Jun 9 10:17:00 2015
Stack: corosync
Current DC: atlas-node2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
10 Resources configured
Online: [ atlas-node1 atlas-node2 ]
Full list of resources:
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
PCSD Status:
atlas-node1: Online
atlas-node2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Cheers,
Alessandro
Thanks,
Soumya
Alessandro
Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
<Alessandro.DeSalvo@xxxxxxxxxxxxx
<mailto:Alessandro.DeSalvo@xxxxxxxxxxxxx>> ha scritto:
Hi,
indeed, it does not work :-)
OK, this is what I did, with 2 machines, running CentOS 7.1,
Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
1) ensured that the machines are able to resolve their IPs (but
this was already true since they were in the DNS);
2) disabled NetworkManager and enabled network on both machines;
3) created a gluster shared volume 'gluster_shared_storage' and
mounted it on '/run/gluster/shared_storage' on all the cluster
nodes using glusterfs native mount (on CentOS 7.1 there is a link
by default /var/run -> ../run)
4) created an empty /etc/ganesha/ganesha.conf;
5) installed pacemaker pcs resource-agents corosync on all cluster
machines;
6) set the ‘hacluster’ user the same password on all machines;
7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
nodes (on both nodes I issued the commands for both nodes)
8) IPv6 is configured by default on all nodes, although the
infrastructure is not ready for IPv6
9) enabled pcsd and started it on all nodes
10) populated /etc/ganesha/ganesha-ha.conf with the following
contents, one per machine:
===> atlas-node1
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node1"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2"
===> atlas-node2
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node2"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2”
11) issued gluster nfs-ganesha enable, but it fails with a cryptic
message:
# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
trusted pool. Do you still want to continue? (y/n) y
nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
Please check the log file for details
Looking at the logs I found nothing really special but this:
==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2015-06-08 17:57:15.672844] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
already stopped
[2015-06-08 17:57:15.675395] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.720692] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.721161] I
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:16.633048] E
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
Initial NFS-Ganesha set up failed
[2015-06-08 17:57:16.641563] E
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
operation 'Volume (null)' failed on localhost : Failed to set up HA
config for NFS-Ganesha. Please check the log file for details
==> /var/log/glusterfs/cmd_history.log <==
[2015-06-08 17:57:16.643615] : nfs-ganesha enable : FAILED :
Failed to set up HA config for NFS-Ganesha. Please check the log
file for details
==> /var/log/glusterfs/cli.log <==
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
with: -1
Also, pcs seems to be fine for the auth part, although it obviously
tells me the cluster is not running.
I, [2015-06-08T19:57:16.305323 #7223] INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2015-06-08T19:57:16.345457 #7223] INFO -- : Running:
/usr/sbin/pcs cluster token-nodes
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1919
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1920
atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
/remote/check_auth HTTP/1.1" 200 68
- -> /remote/check_auth
What am I doing wrong?
Thanks,
Alessandro
Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
Sorry, just another question:
- in my installation of gluster 3.7.1 the command gluster
features.ganesha enable does not work:
# gluster features.ganesha enable
unrecognized word: features.ganesha (position 0)
Which version has full support for it?
Sorry. This option has recently been changed. It is now
$ gluster nfs-ganesha enable
- in the documentation the ccs and cman packages are required,
but they seems not to be available anymore on CentOS 7 and
similar, I guess they are not really required anymore, as pcs
should do the full job
Thanks,
Alessandro
Looks like so from http://clusterlabs.org/quickstart-redhat.html.
Let us know if it doesn't work.
Thanks,
Soumya
Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:
Great, many thanks Soumya!
Cheers,
Alessandro
Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:
Hi,
Please find the slides of the demo video at [1]
We recommend to have a distributed replica volume as a shared
volume for better data-availability.
Size of the volume depends on the workload you may have. Since
it is used to maintain states of NLM/NFSv4 clients, you may
calculate the size of the volume to be minimum of aggregate of
(typical_size_of'/var/lib/nfs'_directory +
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
We shall document about this feature sooner in the gluster docs
as well.
Thanks,
Soumya
[1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
Hi,
I have seen the demo video on ganesha HA,
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the
shared volume. How is it really used, and what should be a
reasonable size for it?
Also, are the slides from the video available somewhere, as
well as a documentation on all this? I did not manage to find
them.
Thanks,
Alessandro
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users