Re: Fwd: nfs-ganesha HA with arbiter volume

Soumya Koduri <skoduri@xxxxxxxxxx> · Tue, 22 Sep 2015 22:50:28 +0530

On 09/22/2015 02:35 PM, Tiemen Ruiten wrote:
I missed having passwordless SSH auth for the root user. However it did
not make a difference:

After verifying prerequisites, issued gluster nfs-ganesha enable on node
cobalt:

Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...
Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.
Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name
Lookups.
Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...
Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.
Sep 22 10:19:56 cobalt systemd: Started RPC bind service.
Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3
locking....
Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting
Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC
Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3
locking..
Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...
Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.
Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
capabilities (legacy support in use)
Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request
from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not
monitoring any hosts
Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha
Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the
following cobalt iron
Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability
Cluster Manager.
Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync Cluster Engine
('2.3.4'): started and ready to provide service.
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync built-in
features: dbus systemd xmlconf snmp pie relro bindnow
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface
[10.100.30.37] is now up.
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration map access [0]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cmap
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration service [1]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cfg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cpg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync profile loading service [4]
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider
corosync_votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: quorum
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.37}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.38}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership
(10.100.30.37:140 <http://10.100.30.37:140>) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership
(10.100.30.37:148 <http://10.100.30.37:148>) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:
Sep 22 10:19:58 cobalt corosync[2816]: [MAIN  ] Completed service
synchronization, ready to provide service.
*Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out.
Terminating.*
*Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine
(corosync):*
*Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.*
*Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed
state.*
Sep 22 10:21:32 cobalt logger: warning: pcs property set
no-quorum-policy=ignore failed
Sep 22 10:21:32 cobalt logger: warning: pcs property set
stonith-enabled=false failed
Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource delete
nfs_start-clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon
ganesha_mon --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace
ganesha_grace --clone failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101
cidr_netmask=32 op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
cobalt-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
nfs-grace-clone then cobalt-cluster_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32
op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
iron-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
iron-cluster_ip-1 with iron-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
iron-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint order
nfs-grace-clone then iron-cluster_ip-1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers iron=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers cobalt=2000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers cobalt=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers iron=2000 failed
Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push
/tmp/tmp.yqLT4m75WG failed

Notice the failed corosync service in bold. I can't find any logs
pointing to a reason. Starting it manually is not a problem:

Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine
(corosync): [  OK  ]

Then I noticed pacemaker was not running on both nodes. Started it
manually and saw the following in /var/log/messages on the other node:

Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin
--replace -o configuration -V --xml-pipe
Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore
Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled
since no STONITH resources have been defined
Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or
disable STONITH with the stonith-enabled option
Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared
data need STONITH to ensure data integrity
Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations
until there are resources to manage
Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!
Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes
until quorum is attained (or no-quorum-policy is set to ignore)
Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-warn-20.bz2
Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found
during PE processing.  Please run "crm_verify -L" to identify issues.
Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete
Sep 22 10:36:44 iron crmd[4617]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

I'm starting to think there is some leftover config somewhere from all
these attempts. Is there a way to completely reset all config related to
NFS-Ganesha and start over?

If you disable nfs-ganesha , that should do the cleanup as well.
# gluster nfs-ganesha disable.

If you are still in doubt and to be safe, after disabling nfs-ganesha, 
run the below script command
# ./usr/libexec/ganesha/ganesha-ha.sh --cleanup /etc/ganesha

Thanks,
Soumya

On 22 September 2015 at 09:04, Soumya Koduri <skoduri@xxxxxxxxxx
<mailto:skoduri@xxxxxxxxxx>> wrote:

    Hi Tiemen,

    Have added the steps to configure HA NFS in the below doc. Please
    verify if you have all the pre-requisites done & steps performed right.

    https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md

    Thanks,
    Soumya

    On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:

        Whoops, replied off-list.

        Additionally I noticed that the generated corosync config is not
        valid,
        as there is no interface section:

        /etc/corosync/corosync.conf

        totem {
        version: 2
        secauth: off
        cluster_name: rd-ganesha-ha
        transport: udpu
        }

        nodelist {
        Â  node {
        Â  Â  Â  Â  ring0_addr: cobalt
        Â  Â  Â  Â  nodeid: 1
        Â  Â  Â  Â }
        Â  node {
        Â  Â  Â  Â  ring0_addr: iron
        Â  Â  Â  Â  nodeid: 2
        Â  Â  Â  Â }
        }

        quorum {
        provider: corosync_votequorum
        two_node: 1
        }

        logging {
        to_syslog: yes
        }

        ---------- Forwarded message ----------
        From: *Tiemen Ruiten* <t.ruiten@xxxxxxxxxxx
        <mailto:t.ruiten@xxxxxxxxxxx> <mailto:t.ruiten@xxxxxxxxxxx
        <mailto:t.ruiten@xxxxxxxxxxx>>>
        Date: 21 September 2015 at 17:16
        Subject: Re:  nfs-ganesha HA with arbiter volume
        To: Jiffin Tony Thottan <jthottan@xxxxxxxxxx
        <mailto:jthottan@xxxxxxxxxx> <mailto:jthottan@xxxxxxxxxx
        <mailto:jthottan@xxxxxxxxxx>>>

        Could you point me to the latest documentation? I've been
        struggling to
        find something up-to-date. I believe I have all the prerequisites:

        - shared storage volume exists and is mounted
        - all nodes in hosts files
        - Gluster-NFS disabled
        - corosync, pacemaker and nfs-ganesha rpm's installed

        Anything I missed?

        Everything has been installed by RPM so is in the default locations:
        /usr/libexec/ganesha/ganesha-ha.sh
        /etc/ganesha/ganesha.conf (empty)
        /etc/ganesha/ganesha-ha.conf

        After I started the pcsd service manually, nfs-ganesha could be
        enabled
        successfully, but there was no virtual IP present on the
        interfaces and
        looking at the system log, I noticed corosync failed to start:

        - on the host where I issued the gluster nfs-ganesha enable command:

        Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
        Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
        Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL
        request from
        iron.int.rdmedia.com <http://iron.int.rdmedia.com>
        <http://iron.int.rdmedia.com> while not monitoring
        any hosts
        Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
        Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync Cluster
        Engine
        ('2.3.4'): started and ready to provide service.
        Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync built-in
        features: dbus systemd xmlconf snmp pie relro bindnow
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
        (UDP/IP Unicast).
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
        transmit/receive security (NSS) crypto: none hash: none
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
        [10.100.30.38] is now up.
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync configuration map access [0]
        Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cmap
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync configuration service [1]
        Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cfg
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync cluster closed process group service v1.01 [2]
        Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cpg
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync profile loading service [4]
        Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
        corosync_votequorum
        Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync vote quorum service v1.0 [5]
        Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name:
        votequorum
        Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine
        loaded:
        corosync cluster quorum service v0.1 [3]
        Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: quorum
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
        {10.100.30.38}
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
        {10.100.30.37}
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
        (10.100.30.38:104 <http://10.100.30.38:104>
        <http://10.100.30.38:104>) was formed. Members joined: 1
        Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
        Sep 21 17:07:20 iron corosync[3427]: [MAIN Â ] Completed service
        synchronization, ready to provide service.
        Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
        (10.100.30.37:108 <http://10.100.30.37:108>
        <http://10.100.30.37:108>) was formed. Members joined: 1

        Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
        (corosync): [FAILED]
        Sep 21 17:08:21 iron systemd: corosync.service: control process
        exited,
        code=exited status=1
        Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster
        Engine.
        Sep 21 17:08:21 iron systemd: Unit corosync.service entered
        failed state.

        - on the other host:

        Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS
        configuration...
        Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
        Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
        Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name
        Lookups.
        Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
        Lookups.
        Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
        Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS
        configuration.
        Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
        Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for
        NFSv2/3
        locking....
        Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
        Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
        Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for
        NFSv2/3
        locking..
        Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
        Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
        Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
        capabilities (legacy support in use)
        Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
        Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL
        request
        from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com>
        <http://cobalt.int.rdmedia.com> while not
        monitoring any hosts
        Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha
        with the
        following cobalt iron
        Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
        Cluster Manager.
        Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
        Sep 21 17:07:20 cobalt systemd: Reloading.
        Sep 21 17:07:20 cobalt systemd:
        [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
        'RemoveOnStop' in section 'Socket'
        Sep 21 17:07:20 cobalt systemd:
        [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
        'RemoveOnStop' in section 'Socket'
        Sep 21 17:07:20 cobalt systemd: Reloading.
        Sep 21 17:07:20 cobalt systemd:
        [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
        'RemoveOnStop' in section 'Socket'
        Sep 21 17:07:20 cobalt systemd:
        [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
        'RemoveOnStop' in section 'Socket'
        Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
        Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync
        Cluster Engine
        ('2.3.4'): started and ready to provide service.
        Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync built-in
        features: dbus systemd xmlconf snmp pie relro bindnow
        Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
        transport
        (UDP/IP Unicast).
        Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
        transmit/receive security (NSS) crypto: none hash: none
        Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network
        interface
        [10.100.30.37] is now up.
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync configuration map access [0]
        Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cmap
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync configuration service [1]
        Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cfg
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync cluster closed process group service v1.01 [2]
        Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cpg
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync profile loading service [4]
        Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum
        provider
        corosync_votequorum
        Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync vote quorum service v1.0 [5]
        Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name:
        votequorum
        Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine
        loaded:
        corosync cluster quorum service v0.1 [3]
        Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name:
        quorum
        Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU
        member
        {10.100.30.37}
        Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU
        member
        {10.100.30.38}
        Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
        (10.100.30.37:100 <http://10.100.30.37:100>
        <http://10.100.30.37:100>) was formed. Members joined: 1
        Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
        Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service
        synchronization, ready to provide service.
        Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
        (10.100.30.37:108 <http://10.100.30.37:108>
        <http://10.100.30.37:108>) was formed. Members joined: 1
        Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all
        cluster
        members. Current votes: 1 expected_votes: 2
        Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
        Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service

        synchronization, ready to provide service.
        Sep 21 17:08:50 cobalt systemd: corosync.service operation timed
        out.
        Terminating.
        Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
        (corosync):
        Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster
        Engine.
        Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered
        failed state.
        Sep 21 17:08:55 cobalt logger: warning: pcs property set
        no-quorum-policy=ignore failed
        Sep 21 17:08:55 cobalt logger: warning: pcs property set
        stonith-enabled=false failed
        Sep 21 17:08:55 cobalt logger: warning: pcs resource create
        nfs_start
        ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone
        failed
        Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
        nfs_start-clone failed
        Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
        ganesha_mon --clone failed
        Sep 21 17:08:56 cobalt logger: warning: pcs resource create
        nfs-grace
        ganesha_grace --clone failed
        Sep 21 17:08:57 cobalt logger: warning pcs resource create
        cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op
        monitor
        interval=15s failed
        Sep 21 17:08:57 cobalt logger: warning: pcs resource create
        cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
        Sep 21 17:08:57 cobalt logger: warning: pcs constraint
        colocation add
        cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
        Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
        cobalt-trigger_ip-1 then nfs-grace-clone failed
        Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
        nfs-grace-clone then cobalt-cluster_ip-1 failed
        Sep 21 17:08:57 cobalt logger: warning pcs resource create
        iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op
        monitor
        interval=15s failed
        Sep 21 17:08:57 cobalt logger: warning: pcs resource create
        iron-trigger_ip-1 ocf:heartbeat:Dummy failed
        Sep 21 17:08:57 cobalt logger: warning: pcs constraint
        colocation add
        iron-cluster_ip-1 with iron-trigger_ip-1 failed
        Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
        iron-trigger_ip-1 then nfs-grace-clone failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
        nfs-grace-clone then iron-cluster_ip-1 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        cobalt-cluster_ip-1 prefers iron=1000 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        cobalt-cluster_ip-1 prefers cobalt=2000 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        iron-cluster_ip-1 prefers cobalt=1000 failed
        Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
        iron-cluster_ip-1 prefers iron=2000 failed
        Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
        /tmp/tmp.nXTfyA1GMR failed
        Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to
        cobalt failed

        BTW, I'm using CentOS 7. There are multiple network interfaces
        on the
        servers, could that be a problem?Â

        On 21 September 2015 at 11:48, Jiffin Tony Thottan
        <jthottan@xxxxxxxxxx <mailto:jthottan@xxxxxxxxxx>
        <mailto:jthottan@xxxxxxxxxx <mailto:jthottan@xxxxxxxxxx>>> wrote:

             On 21/09/15 13:56, Tiemen Ruiten wrote:

                 Hello Soumya, Kaleb, list,

                 This Friday I created the gluster_shared_storage volume
            manually,
                 I just tried it with the command you supplied, but both
            have the
                 same result:

                 from etc-glusterfs-glusterd.vol.log on the node where I
            issued the
                 command:

                 [2015-09-21 07:59:47.756845] I [MSGID: 106474]
                 [glusterd-ganesha.c:403:check_host_list] 0-management:
            ganesha
                 host found Hostname is cobalt
                 [2015-09-21 07:59:48.071755] I [MSGID: 106474]
                 [glusterd-ganesha.c:349:is_ganesha_host] 0-management:
            ganesha
                 host found Hostname is cobalt
                 [2015-09-21 07:59:48.653879] E [MSGID: 106470]
                 [glusterd-ganesha.c:264:glusterd_op_set_ganesha]
            0-management:
                 Initial NFS-Ganesha set up failed

             As far as what I understand from the logs, it called
             setup_cluser()[calls `ganesha-ha.sh` script ] but script
        failed.
             Can u please provide following details :
             -Location of ganesha.sh file??
             -Location of ganesha-ha.conf, ganesha.conf files ?

             And also can u cross check whether all the prerequisites
        before HA
             setup satisfied ?

             --
             With Regards,
             Jiffin

                 [2015-09-21 07:59:48.653912] E [MSGID: 106123]
                 [glusterd-syncop.c:1404:gd_commit_op_phase]
            0-management: Commit
                 of operation 'Volume (null)' failed on localhost :
            Failed to set
                 up HA config for NFS-Ganesha. Please check the log file
            for details
                 [2015-09-21 07:59:45.402458] I [MSGID: 106006]
                 [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
                 0-management: nfs has disconnected from glusterd.
                 [2015-09-21 07:59:48.071578] I [MSGID: 106474]
                 [glusterd-ganesha.c:403:check_host_list] 0-management:
            ganesha
                 host found Hostname is cobalt

                 from etc-glusterfs-glusterd.vol.log on the other node:

                 [2015-09-21 08:12:50.111877] E [MSGID: 106062]
                 [glusterd-op-sm.c:3698:glusterd_op_ac_unlock]
            0-management: Unable
                 to acquire volname
                 [2015-09-21 08:14:50.548087] E [MSGID: 106062]
                 [glusterd-op-sm.c:3635:glusterd_op_ac_lock]
            0-management: Unable
                 to acquire volname
                 [2015-09-21 08:14:50.654746] I [MSGID: 106132]
                 [glusterd-proc-mgmt.c:83:glusterd_proc_stop]
            0-management: nfs
                 already stopped
                 [2015-09-21 08:14:50.655095] I [MSGID: 106474]
                 [glusterd-ganesha.c:403:check_host_list] 0-management:
            ganesha
                 host found Hostname is cobalt
                 [2015-09-21 08:14:51.287156] E [MSGID: 106062]
                 [glusterd-op-sm.c:3698:glusterd_op_ac_unlock]
            0-management: Unable
                 to acquire volname

                 from etc-glusterfs-glusterd.vol.log on the arbiter node:

                 [2015-09-21 08:18:50.934713] E [MSGID: 101075]
                 [common-utils.c:3127:gf_is_local_addr] 0-management:
            error in
                 getaddrinfo: Name or service not known
                 [2015-09-21 08:18:51.504694] E [MSGID: 106062]
                 [glusterd-op-sm.c:3698:glusterd_op_ac_unlock]
            0-management: Unable
                 to acquire volname

                 I have put the hostnames of all servers in my
            /etc/hosts file,
                 including the arbiter node.

                 On 18 September 2015 at 16:52, Soumya Koduri
            <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>
                 <mailto:skoduri@xxxxxxxxxx
            <mailto:skoduri@xxxxxxxxxx>>> wrote:

                     Hi Tiemen,

                     One of the pre-requisites before setting up
            nfs-ganesha HA is
                     to create and mount shared_storage volume. Use
            below CLI for that

                     "gluster volume set all
            cluster.enable-shared-storage enable"

                     It shall create the volume and mount in all the nodes
                     (including the arbiter node). Note this volume shall be
                     mounted on all the nodes of the gluster storage
            pool (though
                     in this case it may not be part of nfs-ganesha
            cluster).

                     So instead of manually creating those directory
            paths, please
                     use above CLI and try re-configuring the setup.

                     Thanks,
                     Soumya

                     On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:

                         Hello Kaleb,

                         I don't:

                         # Name of the HA cluster created.
                         # must be unique within the subnet
                         HA_NAME="rd-ganesha-ha"
                         #
                         # The gluster server from which to mount the
            shared data
                         volume.
                         HA_VOL_SERVER="iron"
                         #
                         # N.B. you may use short names or long names;
            you may not
                         use IP addrs.
                         # Once you select one, stay with it as it will
            be mildly
                         unpleasant to
                         # clean up if you switch later on. Ensure that
            all names -
                         short and/or
                         # long - are in DNS or /etc/hosts on all
            machines in the
                         cluster.
                         #
                         # The subset of nodes of the Gluster Trusted
            Pool that
                         form the ganesha
                         # HA cluster. Hostname is specified.
                         HA_CLUSTER_NODES="cobalt,iron"
                         #HA_CLUSTER_NODES="server1.lab.redhat.com
            <http://server1.lab.redhat.com>
                         <http://server1.lab.redhat.com>

            <http://server1.lab.redhat.com>,server2.lab.redhat.com
            <http://server2.lab.redhat.com>
                         <http://server2.lab.redhat.com>
                         <http://server2.lab.redhat.com>,..."
                         #
                         # Virtual IPs for each of the nodes specified
            above.
                         VIP_server1="10.100.30.101"
                         VIP_server2="10.100.30.102"
                         #VIP_server1_lab_redhat_com="10.0.2.1"
                         #VIP_server2_lab_redhat_com="10.0.2.2"

                         hosts cobalt & iron are the data nodes, the arbiter
                         ip/hostname (neon)
                         isn't mentioned anywhere in this config file.

                         On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
                         <<mailto:kkeithle@xxxxxxxxxx
            <mailto:kkeithle@xxxxxxxxxx>>kkeithle@xxxxxxxxxx
            <mailto:kkeithle@xxxxxxxxxx>
                         <mailto:kkeithle@xxxxxxxxxx
            <mailto:kkeithle@xxxxxxxxxx>>
                         <mailto:kkeithle@xxxxxxxxxx
            <mailto:kkeithle@xxxxxxxxxx> <mailto:kkeithle@xxxxxxxxxx
            <mailto:kkeithle@xxxxxxxxxx>>>>
                         wrote:

                         Â  Â  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
                         Â  Â  > Hello,
                         Â  Â  >
                         Â  Â  > I have a Gluster cluster with a single
            replica 3,
                         arbiter 1 volume (so
                         Â  Â  > two nodes with actual data, one arbiter
            node). I
                         would like to setup
                         Â  Â  > NFS-Ganesha HA for this volume but I'm
            having some
                         difficulties.
                         Â  Â  >
                         Â  Â  > - I needed to create a directory
                         /var/run/gluster/shared_storage
                         Â  Â  > manually on all nodes, or the command
            'gluster
                         nfs-ganesha enable would
                         Â  Â  > fail with the following error:
                         Â  Â  > [2015-09-18 13:13:34.690416] E [MSGID:
            106032]
                         Â  Â  > [glusterd-ganesha.c:708:pre_setup]
            0-THIS->name:
                         mkdir() failed on path
                         Â  Â  >
            /var/run/gluster/shared_storage/nfs-ganesha, [No
                         such file or directory]
                         Â  Â  >
                         Â  Â  > - Then I found out that the command
            connects to
                         the arbiter node as
                         Â  Â  > well, but obviously I don't want to set up
                         NFS-Ganesha there. Is it
                         Â  Â  > actually possible to setup NFS-Ganesha
            HA with an
                         arbiter node? If it's
                         Â  Â  > possible, is there any documentation on
            how to do
                         that?
                         Â  Â  >

                         Â  Â  Please send the
            /etc/ganesha/ganesha-ha.conf file
                         you're using.

                         Â  Â  Probably you have included the arbiter in
            your HA
                         config; that would be
                         Â  Â  a mistake.

                         Â  Â  --

                         Â  Â  Kaleb

                         --
                         Tiemen Ruiten
                         Systems Engineer
                         R&D Media

                         _______________________________________________
                         Gluster-users mailing list
            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
            <mailto:Gluster-users@xxxxxxxxxxx
            <mailto:Gluster-users@xxxxxxxxxxx>>
            http://www.gluster.org/mailman/listinfo/gluster-users

                 --
                 Tiemen Ruiten
                 Systems Engineer
                 R&D Media

                 _______________________________________________
                 Gluster-users mailing list
            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
            <mailto:Gluster-users@xxxxxxxxxxx
            <mailto:Gluster-users@xxxxxxxxxxx>>
            http://www.gluster.org/mailman/listinfo/gluster-users

             _______________________________________________
             Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
        <mailto:Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>>
        http://www.gluster.org/mailman/listinfo/gluster-users

        --
        Tiemen Ruiten
        Systems Engineer
        R&D Media

        --
        Tiemen Ruiten
        Systems Engineer
        R&D Media

        _______________________________________________
        Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
        http://www.gluster.org/mailman/listinfo/gluster-users

--
Tiemen Ruiten
Systems Engineer
R&D Media
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users