Hi Soumya,
Did you check 'pcs status' output that time? Maybe the *-ClusterIP* resources would have gone to Stopped state, making VIPs unavailable.
Yes, I did check the ‘pcs status’ and everything was good at the time.
I just hit the issue again with VIP mounting and df output yesterday.
On the client 1, DF output was hung . I also could NOT mount the gluster volume via VIP x.x.x.001, but I could mount the gluster volume via VIP x.x.x.002 & x.x.x.003. On the client 2, I could mount the gluster volume via VIP x.x.x.001 & x.x.x.002 & x.x.x.003.
Since I did configure pacemaker VIP ip x.x.x.001 for SN1, so I went ahead to stop pcs service on SN1 ‘pcs cluster stop’. The VIP ip x.x.x.001 failover to SN2 as my configuration, afterward I could mount the gluster volume via VIP’s IP x.x.x.001 on the client 1.
Any idea ??
Thanks, ~ Vic Le
On 09/23/2016 02:34 AM, Dung Le wrote:Hello,
I have a pretty straight forward configuration as below:
3 storage nodes running version 3.7.11 with replica of 3 and it using native gluster NFS. corosync version 1.4.7 and pacemaker version 1.1.12 I have DNS round-robin on 3 VIPs living on the 3 storage nodes.
*_Here is how I configure my corosync:_*
SN1 with x.x.x.001 SN2 with x.x.x.002 SN3 with x.x.x.003
****************************************************************************************************************** *_Below is pcs config output:_*
Cluster Name: dfs_cluster Corosync Nodes: SN1 SN2 SN3 Pacemaker Nodes: SN1 SN2 SN3
Resources: Clone: Gluster-clone Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false Resource: Gluster (class=ocf provider=glusterfs type=glusterd) Operations: start interval=0s timeout=20 (Gluster-start-interval-0s) stop interval=0s timeout=20 (Gluster-stop-interval-0s) monitor interval=10s (Gluster-monitor-interval-10s) Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=x.x.x.001 cidr_netmask=32 Operations: start interval=0s timeout=20s (SN1-ClusterIP-start-interval-0s) stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s) monitor interval=10s (SN1-ClusterIP-monitor-interval-10s) Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=x.x.x.002 cidr_netmask=32 Operations: start interval=0s timeout=20s (SN2-ClusterIP-start-interval-0s) stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s) monitor interval=10s (SN2-ClusterIP-monitor-interval-10s) Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=x.x.x.003 cidr_netmask=32 Operations: start interval=0s timeout=20s (SN3-ClusterIP-start-interval-0s) stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s) monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)
Stonith Devices: Fencing Levels:
Location Constraints: Resource: SN1-ClusterIP Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000) Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000) Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000) Resource: SN2-ClusterIP Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000) Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000) Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000) Resource: SN3-ClusterIP Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000) Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000) Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000) Ordering Constraints: start Gluster-clone then start SN1-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN1-ClusterIP-mandatory) start Gluster-clone then start SN2-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN2-ClusterIP-mandatory) start Gluster-clone then start SN3-ClusterIP (kind:Mandatory) (id:order-Gluster-clone-SN3-ClusterIP-mandatory) Colocation Constraints:
Resources Defaults: is-managed: true target-role: Started requires: nothing multiple-active: stop_nkart Operations Defaults: No defaults set
Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.11-97629de no-quorum-policy: ignore stonith-enabled: false
****************************************************************************************************************** *_pcs status output:_*
Cluster name: dfs_cluster Last updated: Thu Sep 22 16:57:35 2016 Last change: Mon Aug 29 18:02:44 2016 Stack: cman Current DC: SN1 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 6 Resources configured
Online: [ SN1 SN2 SN3 ]
Full list of resources:
Clone Set: Gluster-clone [Gluster] Started: [ SN1 SN2 SN3 ] SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1 SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2 SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3
******************************************************************************************************************
When I mount the gluster volume, I'm using the VIP name. It will choose one of the storage nodes to establish NFS.
*_My issue is:_* *_ _* After mounted gluster volume for 1 - 2 hrs, all the clients are reporting not getting df output as df got hung. I did check the dmessage log from client side and getting the following error :
/Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding, still trying/ /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding, still trying/
I did try to mount the gluster volume using the DNS round-robin to different mountpoint but the mount process was not successful.
Did you check 'pcs status' output that time? Maybe the *-ClusterIP* resources would have gone to Stopped state, making VIPs unavailable.Thanks,SoumyaThen Itried to mount the gluster volume using storage node IP itself (not VIP ip), and I was able to mount the gluster volume. Afterward, I flipped all the clients to mount storage node IP directly and they have been up for more than 12hrs without any issue.
Any idea what might cause this issue?
Thanks a lot,
~ Vic Le
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
|