Re: Questions on ganesha HA and shared storage size

Soumya Koduri <skoduri@xxxxxxxxxx> · Wed, 10 Jun 2015 15:28:57 +0530

On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
Hi,
I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen:

tcp6       0      0 :::111                  :::*                    LISTEN      7433/rpcbind
tcp6       0      0 :::2224                 :::*                    LISTEN      9054/ruby
tcp6       0      0 :::22                   :::*                    LISTEN      1248/sshd
udp6       0      0 :::111                  :::*                                7433/rpcbind
udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 ::1:123                 :::*                                31238/ntpd
udp6       0      0 fe80::5484:7aff:fef:123 :::*                                31238/ntpd
udp6       0      0 :::123                  :::*                                31238/ntpd
udp6       0      0 :::824                  :::*                                7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following:

10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded

We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
there could be delay or issue with Gluster-NFS un-registering those 
services and when NFS-Ganesha tries to register to the same port, it 
throws this error. Please try registering Rquota to any random port 
using below config option in "/etc/ganesha/ganesha.conf"

NFS_Core_Param {
        #Use a non-privileged port for RQuota
        Rquota_Port = 4501;
}

and cleanup '/var/cache/rpcbind/' directory before the setup.

Thanks,
Soumya

Thanks,

	Alessandro

Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri@xxxxxxxxxx> ha scritto:

On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f 2 -d "=")

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE="/etc/ganesha/ganesha.conf"

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(

Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'.

Thanks,
Soumya

Cheers,

	Alessandro

On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
OK, I can confirm that the ganesha.nsfd process is actually not
answering to the calls. Here it is what I see:

# rpcinfo -p
    program vers proto   port  service
     100000    4   tcp    111  portmapper
     100000    3   tcp    111  portmapper
     100000    2   tcp    111  portmapper
     100000    4   udp    111  portmapper
     100000    3   udp    111  portmapper
     100000    2   udp    111  portmapper
     100024    1   udp  41594  status
     100024    1   tcp  53631  status
     100003    3   udp   2049  nfs
     100003    3   tcp   2049  nfs
     100003    4   udp   2049  nfs
     100003    4   tcp   2049  nfs
     100005    1   udp  58127  mountd
     100005    1   tcp  56301  mountd
     100005    3   udp  58127  mountd
     100005    3   tcp  56301  mountd
     100021    4   udp  46203  nlockmgr
     100021    4   tcp  41798  nlockmgr
     100011    1   udp    875  rquotad
     100011    1   tcp    875  rquotad
     100011    2   udp    875  rquotad
     100011    2   tcp    875  rquotad

# netstat -lpn | grep ganesha
tcp6      14      0 :::2049                 :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::41798                :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::875                  :::*
LISTEN      11937/ganesha.nfsd
tcp6      10      0 :::56301                :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::564                  :::*
LISTEN      11937/ganesha.nfsd
udp6       0      0 :::2049                 :::*
11937/ganesha.nfsd
udp6       0      0 :::46203                :::*
11937/ganesha.nfsd
udp6       0      0 :::58127                :::*
11937/ganesha.nfsd
udp6       0      0 :::875                  :::*
11937/ganesha.nfsd

I'm attaching the strace of a showmount from a node to the other.
This machinery was working with nfs-ganesha 2.1.0, so it must be
something introduced with 2.2.0.
Cheers,

	Alessandro

On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:

On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
Hi,
OK, the problem with the VIPs not starting is due to the ganesha_mon
heartbeat script looking for a pid file called
/var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
creating /var/run/ganesha.pid, this needs to be corrected. The file is
in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
For the moment I have created a symlink in this way and it works:

ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid

Thanks. Please update this as well in the bug.

So far so good, the VIPs are up and pingable, but still there is the
problem of the hanging showmount (i.e. hanging RPC).
Still, I see a lot of errors like this in /var/log/messages:

Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]

While ganesha.log shows the server is not in grace:

09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
<http://buildhw-09.phx2.fedoraproject.org>
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
:Configuration file successfully parsed
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
:Initializing ID Mapper.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
successfully initialized.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
found in configuration file !!!
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
((null):0): Empty configuration file
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
:CAP_SYS_RESOURCE was successfully removed for proper quota management
in FSAL
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
capabilities are: =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
credentials for principal nfs
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
thread initialized
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
IN GRACE, duration 60
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
:Callback creds directory (/var/run/ganesha) already exists
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
:gssd_refresh_krb5_machine_credential failed (2:2)
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
delayed executor.
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
dispatcher thread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
dispatcher started
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:gsh_dbusthread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
GRACE
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
fridge was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:-------------------------------------------------
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :             NFS
SERVER INITIALIZED
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:-------------------------------------------------
09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
NOT IN GRACE

Please check the status of nfs-ganesha
$service nfs-ganesha status

Could you try taking a packet trace (during showmount or mount) and
check the server responses.

Thanks,
Soumya

Cheers,

Alessandro

Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:

Hi Soumya,

Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:

On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

    if [ -e /etc/os-release ]; then
        RHEL6_PCS_CNAME_OPTION=""
    fi

This is OK for RHEL < 7, but does not work for >= 7. I have changed
it to the following, to make it working:

    if [ -e /etc/os-release ]; then
        eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
        [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
RHEL6_PCS_CNAME_OPTION=""
    fi

Oh..Thanks for the fix. Could you please file a bug for the same (and
probably submit your fix as well). We shall have it corrected.

Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601

Apart from that, the VIP_<node> I was using were wrong, and I should
have converted all the “-“ to underscores, maybe this could be
mentioned in the documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:

Sure. Thanks again for pointing it out. We shall make a note of it.

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Here corosync and pacemaker shows 'disabled' state. Can you check the
status of their services. They should be running prior to cluster
creation. We need to include that step in document as well.

Ah, OK, you’re right, I have added it to my puppet modules (we install
and configure ganesha via puppet, I’ll put the module on puppetforge
soon, in case anyone is interested).

But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this
error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,

Hmm that's strange. Sometimes, in case if there was no proper cleanup
done while trying to re-create the cluster, we have seen such issues.

https://bugzilla.redhat.com/show_bug.cgi?id=1227709

http://review.gluster.org/#/c/11093/

Can you please unexport all the volumes, teardown the cluster using
'gluster vol set <volname> ganesha.enable off’

OK:

# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

'gluster ganesha disable' command.

I’m assuming you wanted to write nfs-ganesha instead?

# gluster nfs-ganesha disable
ganesha enable : success

A side note (not really important): it’s strange that when I do a
disable the message is “ganesha enable” :-)

Verify if the following files have been deleted on all the nodes-
'/etc/cluster/cluster.conf’

this file is not present at all, I think it’s not needed in CentOS 7

'/etc/ganesha/ganesha.conf’,

it’s still there, but empty, and I guess it should be OK, right?

'/etc/ganesha/exports/*’

no more files there

'/var/lib/pacemaker/cib’

it’s empty

Verify if the ganesha service is stopped on all the nodes.

nope, it’s still running, I will stop it.

start/restart the services - corosync, pcs.

In the node where I issued the nfs-ganesha disable there is no more
any /etc/corosync/corosync.conf so corosync won’t start. The other
node instead still has the file, it’s strange.

And re-try the HA cluster creation
'gluster ganesha enable’

This time (repeated twice) it did not work at all:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:13:43 2015
Last change: Tue Jun  9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

I tried then "pcs cluster destroy" on both nodes, and then again
nfs-ganesha enable, but now I’m back to the old problem:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:22:27 2015
Last change: Tue Jun  9 10:17:00 2015
Stack: corosync
Current DC: atlas-node2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
10 Resources configured

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Cheers,

Alessandro

Thanks,
Soumya

Alessandro

Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
<Alessandro.DeSalvo@xxxxxxxxxxxxx
<mailto:Alessandro.DeSalvo@xxxxxxxxxxxxx>> ha scritto:

Hi,
indeed, it does not work :-)
OK, this is what I did, with 2 machines, running CentOS 7.1,
Glusterfs 3.7.1 and nfs-ganesha 2.2.0:

1) ensured that the machines are able to resolve their IPs (but
this was already true since they were in the DNS);
2) disabled NetworkManager and enabled network on both machines;
3) created a gluster shared volume 'gluster_shared_storage' and
mounted it on '/run/gluster/shared_storage' on all the cluster
nodes using glusterfs native mount (on CentOS 7.1 there is a link
by default /var/run -> ../run)
4) created an empty /etc/ganesha/ganesha.conf;
5) installed pacemaker pcs resource-agents corosync on all cluster
machines;
6) set the ‘hacluster’ user the same password on all machines;
7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
nodes (on both nodes I issued the commands for both nodes)
8) IPv6 is configured by default on all nodes, although the
infrastructure is not ready for IPv6
9) enabled pcsd and started it on all nodes
10) populated /etc/ganesha/ganesha-ha.conf with the following
contents, one per machine:

===> atlas-node1
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node1"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2"

===> atlas-node2
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node2"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2”

11) issued gluster nfs-ganesha enable, but it fails with a cryptic
message:

# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
trusted pool. Do you still want to continue? (y/n) y
nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
Please check the log file for details

Looking at the logs I found nothing really special but this:

==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2015-06-08 17:57:15.672844] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
already stopped
[2015-06-08 17:57:15.675395] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.720692] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.721161] I
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:16.633048] E
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
Initial NFS-Ganesha set up failed
[2015-06-08 17:57:16.641563] E
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
operation 'Volume (null)' failed on localhost : Failed to set up HA
config for NFS-Ganesha. Please check the log file for details

==> /var/log/glusterfs/cmd_history.log <==
[2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
Failed to set up HA config for NFS-Ganesha. Please check the log
file for details

==> /var/log/glusterfs/cli.log <==
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
with: -1

Also, pcs seems to be fine for the auth part, although it obviously
tells me the cluster is not running.

I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running:
/usr/sbin/pcs cluster token-nodes
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1919
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1920
atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
/remote/check_auth HTTP/1.1" 200 68
- -> /remote/check_auth

What am I doing wrong?
Thanks,

Alessandro

Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:

On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
Sorry, just another question:

- in my installation of gluster 3.7.1 the command gluster
features.ganesha enable does not work:

# gluster features.ganesha enable
unrecognized word: features.ganesha (position 0)

Which version has full support for it?

Sorry. This option has recently been changed. It is now

$ gluster nfs-ganesha enable

- in the documentation the ccs and cman packages are required,
but they seems not to be available anymore on CentOS 7 and
similar, I guess they are not really required anymore, as pcs
should do the full job

Thanks,

Alessandro

Looks like so from http://clusterlabs.org/quickstart-redhat.html.
Let us know if it doesn't work.

Thanks,
Soumya

Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:

Great, many thanks Soumya!
Cheers,

Alessandro

Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:

Hi,

Please find the slides of the demo video at [1]

We recommend to have a distributed replica volume as a shared
volume for better data-availability.

Size of the volume depends on the workload you may have. Since
it is used to maintain states of NLM/NFSv4 clients, you may
calculate the size of the volume to be minimum of aggregate of
(typical_size_of'/var/lib/nfs'_directory +
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)

We shall document about this feature sooner in the gluster docs
as well.

Thanks,
Soumya

[1] - http://www.slideshare.net/SoumyaKoduri/high-49117846

On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
Hi,
I have seen the demo video on ganesha HA,
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the
shared volume. How is it really used, and what should be a
reasonable size for it?
Also, are the slides from the video available somewhere, as
well as a documentation on all this? I did not manage to find
them.
Thanks,

Alessandro

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users