Re: pacemaker VIP routing latency to gluster node.

Soumya Koduri <skoduri@xxxxxxxxxx> · Fri, 23 Sep 2016 14:03:40 +0530

On 09/23/2016 02:34 AM, Dung Le wrote:
Hello,

I have a pretty straight forward configuration as below:

3 storage nodes running version 3.7.11 with replica of 3 and it using
native gluster NFS.
corosync version 1.4.7 and pacemaker version 1.1.12
I have DNS round-robin on 3 VIPs living on the 3 storage nodes.

*_Here is how I configure my corosync:_*

SN1 with x.x.x.001
SN2 with x.x.x.002
SN3 with x.x.x.003

******************************************************************************************************************
*_Below is pcs config output:_*

Cluster Name: dfs_cluster
Corosync Nodes:
 SN1 SN2 SN3
Pacemaker Nodes:
 SN1 SN2 SN3

Resources:
 Clone: Gluster-clone
  Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false
  Resource: Gluster (class=ocf provider=glusterfs type=glusterd)
   Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)
               stop interval=0s timeout=20 (Gluster-stop-interval-0s)
               monitor interval=10s (Gluster-monitor-interval-10s)
 Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.001 cidr_netmask=32
  Operations: start interval=0s timeout=20s
(SN1-ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s)
              monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)
 Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.002 cidr_netmask=32
  Operations: start interval=0s timeout=20s
(SN2-ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s)
              monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)
 Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.003 cidr_netmask=32
  Operations: start interval=0s timeout=20s
(SN3-ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s)
              monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: SN1-ClusterIP
    Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)
    Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)
    Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)
  Resource: SN2-ClusterIP
    Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)
    Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)
    Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)
  Resource: SN3-ClusterIP
    Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)
    Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)
    Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)
Ordering Constraints:
  start Gluster-clone then start SN1-ClusterIP (kind:Mandatory)
(id:order-Gluster-clone-SN1-ClusterIP-mandatory)
  start Gluster-clone then start SN2-ClusterIP (kind:Mandatory)
(id:order-Gluster-clone-SN2-ClusterIP-mandatory)
  start Gluster-clone then start SN3-ClusterIP (kind:Mandatory)
(id:order-Gluster-clone-SN3-ClusterIP-mandatory)
Colocation Constraints:

Resources Defaults:
 is-managed: true
 target-role: Started
 requires: nothing
 multiple-active: stop_nkart
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 no-quorum-policy: ignore
 stonith-enabled: false

******************************************************************************************************************
*_pcs status output:_*

Cluster name: dfs_cluster
Last updated: Thu Sep 22 16:57:35 2016
Last change: Mon Aug 29 18:02:44 2016
Stack: cman
Current DC: SN1 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
6 Resources configured

Online: [ SN1 SN2 SN3 ]

Full list of resources:

 Clone Set: Gluster-clone [Gluster]
     Started: [ SN1 SN2 SN3 ]
 SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1
 SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2
 SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3

******************************************************************************************************************

When I mount the gluster volume, I'm using the VIP name. It will choose
one of the storage nodes to establish NFS.

*_My issue is:_*
*_
_*
After mounted gluster volume for 1 - 2 hrs, all the clients are
reporting not getting df output as df got hung. I did check the dmessage
log from client side and getting the following error :

/Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding,
still trying/
/Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding,
still trying/

I did try to mount the gluster volume using the DNS round-robin to
different mountpoint but the mount process was not successful.

Did you check 'pcs status' output that time? Maybe the *-ClusterIP* 
resources would have gone to Stopped state, making VIPs unavailable.

Thanks,
Soumya

Then I
tried to mount the gluster volume using storage node IP itself (not VIP
ip), and I was able to mount the gluster volume. Afterward, I flipped
all the clients to mount storage node IP directly and they have been up
for more than 12hrs without any issue.

Any idea what might cause this issue?

Thanks a lot,

~ Vic Le

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users