Re: Questions on ganesha HA and shared storage size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CCin ganesha-devel to get more inputs.

In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.

commit - git show 'd7e8f255' , which got added in v2.2 has more details.

> # netstat -ltaupn | grep 2049
> tcp6       4      0 :::2049                 :::*
> LISTEN      32080/ganesha.nfsd
> tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
> -
> tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> CLOSE_WAIT  -
> udp6       0      0 :::2049                 :::*
> 32080/ganesha.nfsd
>

Looks like (even from the logs and the netstat output), there was a shutdown request even before the server has come out of grace period.

10/06/2015 01:58:53 : epoch 55777da1 : node2 : ganesha.nfsd-20696[work-6] nfs_rpc_dequeue_req :DISP :F_DBG :dequeue_req try qpair REQ_Q_LOW_LATENCY 0x7fdf8dc67b00:0x7fdf8dc67b68 10/06/2015 01:58:53 : epoch 55777da1 : node2 : ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
......
10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop 10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[work-12] nfs_rpc_consume_req :DISP :F_DBG :try splice, qpair REQ_Q_LOW_LATENCY consumer qsize=0 producer qsize=0
......
10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop 10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
.......
10/06/2015 02:00:00 : epoch 55777da1 : node2 : ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE 10/06/2015 02:00:00 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop

When you observe the hang, please take 'gstack <ganesha_pid>' output and post it in the mail.

Thanks,
Soumya

On 06/11/2015 12:37 AM, Alessandro De Salvo wrote:
Hi,
by looking at the connections I also see a strange problem:

# netstat -ltaupn | grep 2049
tcp6       4      0 :::2049                 :::*
LISTEN      32080/ganesha.nfsd
tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
-
tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
CLOSE_WAIT  -
udp6       0      0 :::2049                 :::*
32080/ganesha.nfsd


Why tcp6 is used with an IPv4 address?
In another machine where ganesha 2.1.0 is running I see tcp is used, not
tcp6.
Could it be that the RPC are always trying to use IPv6? That would be
wrong.
Thanks,

	Alessandro

On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:

On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
Hi,
I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen:

tcp6       0      0 :::111                  :::*                    LISTEN      7433/rpcbind
tcp6       0      0 :::2224                 :::*                    LISTEN      9054/ruby
tcp6       0      0 :::22                   :::*                    LISTEN      1248/sshd
udp6       0      0 :::111                  :::*                                7433/rpcbind
udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
udp6       0      0 ::1:123                 :::*                                31238/ntpd
udp6       0      0 fe80::5484:7aff:fef:123 :::*                                31238/ntpd
udp6       0      0 :::123                  :::*                                31238/ntpd
udp6       0      0 :::824                  :::*                                7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following:


10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded


We have seen such issues with RPCBIND few times. NFS-Ganesha setup first
disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes,
there could be delay or issue with Gluster-NFS un-registering those
services and when NFS-Ganesha tries to register to the same port, it
throws this error. Please try registering Rquota to any random port
using below config option in "/etc/ganesha/ganesha.conf"

NFS_Core_Param {
          #Use a non-privileged port for RQuota
          Rquota_Port = 4501;
}

and cleanup '/var/cache/rpcbind/' directory before the setup.

Thanks,
Soumya


Thanks,

	Alessandro




Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri@xxxxxxxxxx> ha scritto:



On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f 2 -d "=")

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE="/etc/ganesha/ganesha.conf"

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(

Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'.

Thanks,
Soumya

Cheers,

	Alessandro


On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
OK, I can confirm that the ganesha.nsfd process is actually not
answering to the calls. Here it is what I see:

# rpcinfo -p
     program vers proto   port  service
      100000    4   tcp    111  portmapper
      100000    3   tcp    111  portmapper
      100000    2   tcp    111  portmapper
      100000    4   udp    111  portmapper
      100000    3   udp    111  portmapper
      100000    2   udp    111  portmapper
      100024    1   udp  41594  status
      100024    1   tcp  53631  status
      100003    3   udp   2049  nfs
      100003    3   tcp   2049  nfs
      100003    4   udp   2049  nfs
      100003    4   tcp   2049  nfs
      100005    1   udp  58127  mountd
      100005    1   tcp  56301  mountd
      100005    3   udp  58127  mountd
      100005    3   tcp  56301  mountd
      100021    4   udp  46203  nlockmgr
      100021    4   tcp  41798  nlockmgr
      100011    1   udp    875  rquotad
      100011    1   tcp    875  rquotad
      100011    2   udp    875  rquotad
      100011    2   tcp    875  rquotad

# netstat -lpn | grep ganesha
tcp6      14      0 :::2049                 :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::41798                :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::875                  :::*
LISTEN      11937/ganesha.nfsd
tcp6      10      0 :::56301                :::*
LISTEN      11937/ganesha.nfsd
tcp6       0      0 :::564                  :::*
LISTEN      11937/ganesha.nfsd
udp6       0      0 :::2049                 :::*
11937/ganesha.nfsd
udp6       0      0 :::46203                :::*
11937/ganesha.nfsd
udp6       0      0 :::58127                :::*
11937/ganesha.nfsd
udp6       0      0 :::875                  :::*
11937/ganesha.nfsd

I'm attaching the strace of a showmount from a node to the other.
This machinery was working with nfs-ganesha 2.1.0, so it must be
something introduced with 2.2.0.
Cheers,

	Alessandro



On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:

On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
Hi,
OK, the problem with the VIPs not starting is due to the ganesha_mon
heartbeat script looking for a pid file called
/var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
creating /var/run/ganesha.pid, this needs to be corrected. The file is
in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
For the moment I have created a symlink in this way and it works:

ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid

Thanks. Please update this as well in the bug.

So far so good, the VIPs are up and pingable, but still there is the
problem of the hanging showmount (i.e. hanging RPC).
Still, I see a lot of errors like this in /var/log/messages:

Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
nfs-mon_monitor_10000:29292:stderr [ Error: Resource does not exist. ]

While ganesha.log shows the server is not in grace:

09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
<http://buildhw-09.phx2.fedoraproject.org>
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
:Configuration file successfully parsed
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
:Initializing ID Mapper.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
successfully initialized.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
found in configuration file !!!
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
((null):0): Empty configuration file
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
:CAP_SYS_RESOURCE was successfully removed for proper quota management
in FSAL
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
capabilities are: =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
credentials for principal nfs
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
thread initialized
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
IN GRACE, duration 60
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
:Callback creds directory (/var/run/ganesha) already exists
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
:gssd_refresh_krb5_machine_credential failed (2:2)
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
delayed executor.
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
dispatcher thread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
dispatcher started
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:gsh_dbusthread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
GRACE
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
fridge was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:-------------------------------------------------
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :             NFS
SERVER INITIALIZED
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:-------------------------------------------------
09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now
NOT IN GRACE


Please check the status of nfs-ganesha
$service nfs-ganesha status

Could you try taking a packet trace (during showmount or mount) and
check the server responses.

Thanks,
Soumya

Cheers,

Alessandro


Il giorno 09/giu/2015, alle ore 10:36, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:

Hi Soumya,

Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:



On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

     if [ -e /etc/os-release ]; then
         RHEL6_PCS_CNAME_OPTION=""
     fi

This is OK for RHEL < 7, but does not work for >= 7. I have changed
it to the following, to make it working:

     if [ -e /etc/os-release ]; then
         eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
         [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
RHEL6_PCS_CNAME_OPTION=""
     fi

Oh..Thanks for the fix. Could you please file a bug for the same (and
probably submit your fix as well). We shall have it corrected.

Just did it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601


Apart from that, the VIP_<node> I was using were wrong, and I should
have converted all the “-“ to underscores, maybe this could be
mentioned in the documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:

Sure. Thanks again for pointing it out. We shall make a note of it.

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
      Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
      Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
   atlas-node1: Online
   atlas-node2: Online

Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled


Here corosync and pacemaker shows 'disabled' state. Can you check the
status of their services. They should be running prior to cluster
creation. We need to include that step in document as well.

Ah, OK, you’re right, I have added it to my puppet modules (we install
and configure ganesha via puppet, I’ll put the module on puppetforge
soon, in case anyone is interested).


But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this
error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,

Hmm that's strange. Sometimes, in case if there was no proper cleanup
done while trying to re-create the cluster, we have seen such issues.

https://bugzilla.redhat.com/show_bug.cgi?id=1227709

http://review.gluster.org/#/c/11093/

Can you please unexport all the volumes, teardown the cluster using
'gluster vol set <volname> ganesha.enable off’

OK:

# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.


'gluster ganesha disable' command.

I’m assuming you wanted to write nfs-ganesha instead?

# gluster nfs-ganesha disable
ganesha enable : success


A side note (not really important): it’s strange that when I do a
disable the message is “ganesha enable” :-)


Verify if the following files have been deleted on all the nodes-
'/etc/cluster/cluster.conf’

this file is not present at all, I think it’s not needed in CentOS 7

'/etc/ganesha/ganesha.conf’,

it’s still there, but empty, and I guess it should be OK, right?

'/etc/ganesha/exports/*’

no more files there

'/var/lib/pacemaker/cib’

it’s empty


Verify if the ganesha service is stopped on all the nodes.

nope, it’s still running, I will stop it.


start/restart the services - corosync, pcs.

In the node where I issued the nfs-ganesha disable there is no more
any /etc/corosync/corosync.conf so corosync won’t start. The other
node instead still has the file, it’s strange.


And re-try the HA cluster creation
'gluster ganesha enable’

This time (repeated twice) it did not work at all:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:13:43 2015
Last change: Tue Jun  9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
      Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
      Started: [ atlas-node1 atlas-node2 ]
atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
   atlas-node1: Online
   atlas-node2: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled



I tried then "pcs cluster destroy" on both nodes, and then again
nfs-ganesha enable, but now I’m back to the old problem:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:22:27 2015
Last change: Tue Jun  9 10:17:00 2015
Stack: corosync
Current DC: atlas-node2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
10 Resources configured


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

Clone Set: nfs-mon-clone [nfs-mon]
      Started: [ atlas-node1 atlas-node2 ]
Clone Set: nfs-grace-clone [nfs-grace]
      Started: [ atlas-node1 atlas-node2 ]
atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped
atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2
atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
   atlas-node1: Online
   atlas-node2: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled


Cheers,

Alessandro



Thanks,
Soumya

Alessandro

Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo
<Alessandro.DeSalvo@xxxxxxxxxxxxx
<mailto:Alessandro.DeSalvo@xxxxxxxxxxxxx>> ha scritto:

Hi,
indeed, it does not work :-)
OK, this is what I did, with 2 machines, running CentOS 7.1,
Glusterfs 3.7.1 and nfs-ganesha 2.2.0:

1) ensured that the machines are able to resolve their IPs (but
this was already true since they were in the DNS);
2) disabled NetworkManager and enabled network on both machines;
3) created a gluster shared volume 'gluster_shared_storage' and
mounted it on '/run/gluster/shared_storage' on all the cluster
nodes using glusterfs native mount (on CentOS 7.1 there is a link
by default /var/run -> ../run)
4) created an empty /etc/ganesha/ganesha.conf;
5) installed pacemaker pcs resource-agents corosync on all cluster
machines;
6) set the ‘hacluster’ user the same password on all machines;
7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the
nodes (on both nodes I issued the commands for both nodes)
8) IPv6 is configured by default on all nodes, although the
infrastructure is not ready for IPv6
9) enabled pcsd and started it on all nodes
10) populated /etc/ganesha/ganesha-ha.conf with the following
contents, one per machine:


===> atlas-node1
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node1"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2"

===> atlas-node2
# Name of the HA cluster created.
HA_NAME="ATLAS_GANESHA_01"
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node2"
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1"
VIP_atlas-node2=“x.x.x.2”

11) issued gluster nfs-ganesha enable, but it fails with a cryptic
message:

# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
trusted pool. Do you still want to continue? (y/n) y
nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha.
Please check the log file for details

Looking at the logs I found nothing really special but this:

==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2015-06-08 17:57:15.672844] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
already stopped
[2015-06-08 17:57:15.675395] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.720692] I
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:15.721161] I
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
found Hostname is atlas-node2
[2015-06-08 17:57:16.633048] E
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
Initial NFS-Ganesha set up failed
[2015-06-08 17:57:16.641563] E
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
operation 'Volume (null)' failed on localhost : Failed to set up HA
config for NFS-Ganesha. Please check the log file for details

==> /var/log/glusterfs/cmd_history.log <==
[2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
Failed to set up HA config for NFS-Ganesha. Please check the log
file for details

==> /var/log/glusterfs/cli.log <==
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
with: -1


Also, pcs seems to be fine for the auth part, although it obviously
tells me the cluster is not running.

I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running:
/usr/sbin/pcs cluster token-nodes
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1919
::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET
/remote/check_auth HTTP/1.1" 200 68 0.1920
atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET
/remote/check_auth HTTP/1.1" 200 68
- -> /remote/check_auth


What am I doing wrong?
Thanks,

Alessandro

Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:




On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
Sorry, just another question:

- in my installation of gluster 3.7.1 the command gluster
features.ganesha enable does not work:

# gluster features.ganesha enable
unrecognized word: features.ganesha (position 0)

Which version has full support for it?

Sorry. This option has recently been changed. It is now

$ gluster nfs-ganesha enable



- in the documentation the ccs and cman packages are required,
but they seems not to be available anymore on CentOS 7 and
similar, I guess they are not really required anymore, as pcs
should do the full job

Thanks,

Alessandro

Looks like so from http://clusterlabs.org/quickstart-redhat.html.
Let us know if it doesn't work.

Thanks,
Soumya


Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo
<alessandro.desalvo@xxxxxxxxxxxxx
<mailto:alessandro.desalvo@xxxxxxxxxxxxx>> ha scritto:

Great, many thanks Soumya!
Cheers,

Alessandro

Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri
<skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>> ha scritto:

Hi,

Please find the slides of the demo video at [1]

We recommend to have a distributed replica volume as a shared
volume for better data-availability.

Size of the volume depends on the workload you may have. Since
it is used to maintain states of NLM/NFSv4 clients, you may
calculate the size of the volume to be minimum of aggregate of
(typical_size_of'/var/lib/nfs'_directory +
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)

We shall document about this feature sooner in the gluster docs
as well.

Thanks,
Soumya

[1] - http://www.slideshare.net/SoumyaKoduri/high-49117846

On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
Hi,
I have seen the demo video on ganesha HA,
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the
shared volume. How is it really used, and what should be a
reasonable size for it?
Also, are the slides from the video available somewhere, as
well as a documentation on all this? I did not manage to find
them.
Thanks,

Alessandro



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users




_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux