Maybe you don't think you have a problem with your fencing, but anyway, you can try to use group_tool dump when you have the problem, In this way we can exclude the fencing and dlm waiting for fencing. 2016-01-06 11:08 GMT+01:00 B.Baransel BAĞCI <bagcib@xxxxxxxxxx>: > > Alinti Bob Peterson <rpeterso@xxxxxxxxxx> > >> ----- Original Message ----- >>> >>> Hi list, >>> >>> I have some problems with GFS2 with failed nodes. After one of the >>> cluster nodes fenced and rebooted, it cannot mount some of the gfs2 >>> file systems but hangs on the mount operation. No output. I've waited >>> nearly 10 minutes to mount single disk but it didn't respond. Only >>> solution is to shutdown all nodes and clean start of the cluster. I'm >>> suspecting journal size or file system quotas. >>> >>> I have 8-node rhel-6 cluster with GFS2 formatted disks which are all >>> mounted by all nodes. >>> There are two types of disk: >>> Type A : >>> ~50 GB disk capacity >>> 8 journal with size 512MB >>> block-size: 1024 >>> very small files (Avg: 50 byte - sym.links) >>> ~500.000 file (inode) >>> Usage: 10% >>> Nearly no write IO (under 1000 file per day) >>> No user quota (quota=off) >>> Mount options: async,quota=off,nodiratime,noatime >>> >>> Tybe B : >>> ~1 TB disk capacity >>> 8 journal with size 512MB >>> block-size: 4096 >>> relatively small files (Avg: 20 KB) >>> ~5.000.000 file (inode) >>> Usage: 20% >>> write IO ~50.000 file per day >>> user quota is on (some of the users exceeded quota) >>> Mount options: async,quota=on,nodiratime,noatime >>> >>> To improve performance, I set journal size to 512 MB instead of 128 MB >>> default. All disk are connected with fiber from SAN Storage. All disk >>> on cluster LVM. All nodes connected to each other with private >>> Gb-switch. >>> >>> For example, after "node5" failed and fenced, it can re-enter the >>> cluster. When i try "service gfs2 start", it can mount "Type A" disks, >>> but hangs on the first "Tybe B" disk. Logs hangs on the "Trying to >>> join cluster lock_dlm" message: >>> >>> ... >>> Jan 05 00:01:52 node5 lvm[4090]: Found volume group "VG_of_TYPE_A" >>> Jan 05 00:01:52 node5 lvm[4119]: Activated 2 logical volumes in >>> volume group VG_of_TYPE_A >>> Jan 05 00:01:52 node5 lvm[4119]: 2 logical volume(s) in volume >>> group "VG_of_TYPE_A" now active >>> Jan 05 00:01:52 node5 lvm[4119]: Wiping internal VG cache >>> Jan 05 00:02:26 node5 kernel: Slow work thread pool: Starting up >>> Jan 05 00:02:26 node5 kernel: Slow work thread pool: Ready >>> Jan 05 00:02:26 node5 kernel: GFS2 (built Dec 12 2014 16:06:57) >>> installed >>> Jan 05 00:02:26 node5 kernel: GFS2: fsid=: Trying to join cluster >>> "lock_dlm", "TESTCLS:typeA1" >>> Jan 05 00:02:26 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: Joined >>> cluster. Now mounting FS... >>> Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5, >>> already locked for use >>> Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5: >>> Looking at journal... >>> Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5: >>> Done >>> Jan 05 00:02:27 node5 kernel: GFS2: fsid=: Trying to join cluster >>> "lock_dlm", "TESTCLS:typeA2" >>> Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: Joined >>> cluster. Now mounting FS... >>> Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5, >>> already locked for use >>> Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5: >>> Looking at journal... >>> Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5: >>> Done >>> Jan 05 00:02:28 node5 kernel: GFS2: fsid=: Trying to join cluster >>> "lock_dlm", "TESTCLS:typeB1" >>> >>> >>> I've waited nearly 10 minutes in this state without respond or log. In >>> this state, I cannot do `ls` in another nodes for this file system. >>> Any idea of the cause of the problem? How is the cluster affected by >>> journal size or count? >>> -- >>> B.Baransel BAĞCI >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> Hi, >> >> If mount hangs, it's hard to say what it's doing. It could be waiting >> for a dlm lock, which is waiting on a pending fencing option. >> >> There have been occasional hangs discovered in journal replay, but not >> for a long time. It's less likely. What kernel version is this? >> December 12, 2014 is more than a year old, so it might be something >> we've already found and fixed. If this is RHEL6 or Centos6 or similar, >> you could try catting the /proc/<pid>/stack file of the mount helper >> process, aka mount.gfs2 and see what it's doing. >> > > OS is Rhel-6 and kernel is 2.6.32-504.3.3.el6.x86_64. Actually this problem > began with the increasing of the data. At beginning with very low data and > IO, this problem didn't exist. This cluster system is isolated and don't get > updates. So, no kernel or no package change for a year. Also, I execute fsck > on every disk after each crash. > In next crash, i will look to stack file, but I cannot make system crash > right now, because it's used by other services. > >> Normally, dlm recovery and gfs2 recovery take only a few seconds time. >> The size of journals and number of journals will likely have no effect. >> If I was a betting man, I'd bet that GFS2 is waiting for DLM, and >> DLM is waiting for a fence operation to be completed successfully >> before continuing. If this is rhel6 or earlier, you could do >> "group_tool dump" to find out if the cluster membership is sane or >> if it's waiting for something like this. > > > Fence devices are working correctly. Cluster master calls fence device > (ipmi) and failed node restarts. Then I can start "cman" and "clvmd" > succesfully. Until this point there no problem. Also all other nodes are > working correctly. Thus, I don't think there is a waiting fence operation. > And this problem started when the data on the disk increased. > > My cluster conf like this: > > <?xml version="1.0"?> > <cluster config_version="1" name="TESTCLS"> > <totem consensus="4000" token="2000"/> > <cman cluster_id="1234" expected_votes="1"> > <multicast addr="233.41.51.61"/> > </cman> > <clusternodes> > <clusternode name="node1.TESTCLS" > nodeid="5"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.36" port="ilo2" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node2.TESTCLS" > nodeid="11"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.41" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node3.TESTCLS" > nodeid="12"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.49" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node4.TESTCLS" > nodeid="21"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.82" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node5.TESTCLS" > nodeid="22"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.79" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node6.TESTCLS" > nodeid="23"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.81" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node7.TESTCLS" > nodeid="31"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.78" port="ipmi" name="custom_fence"/> > > </method> > </fence> > </clusternode> > <clusternode name="node8.TESTCLS" > nodeid="32"> > <fence> > > <method name="CUSTOM"> > > <device ipaddr="192.168.0.11" port="ilo" name="custom_fence"/> > > </method> > </fence> > </clusternode> > </clusternodes> > <fence_daemon post_fail_delay="0" > post_join_delay="10"/> > <fencedevices> > <fencedevice agent="fence_script" > name="custom_fence"/> > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain > name="NFS" ordered="1" restricted="1"> > > <failoverdomainnode name="node7.TESTCLS" priority="10"/> > > <failoverdomainnode name="node8.TESTCLS" priority="20"/> > </failoverdomain> > </failoverdomains> > <resources> > <ip > address="192.168.1.34/24" sleeptime="10"/> > <nfsserver > name="NFS_service_resource"/> > </resources> > <service autostart="1" domain="NFS" > exclusive="1" name="NFS_service" recovery="relocate"> > <ip > ref="192.168.1.34/24"> > > <nfsserver ref="NFS_service_resource"/> > </ip> > </service> > </rm> > <dlm plock_ownership="1" plock_rate_limit="0"/> > <gfs_controld plock_rate_limit="0"/> > </cluster> > > Note: Fence agent is a script and works correctly. It calls ipmi or ilo > fence agents and returns 0 after success. > > > thanks > -- > B.Baransel BAĞCI > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- .~. /V\ // \\ /( )\ ^`~'^ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster