Re: GFS2 mount hangs for some disks

emmanuel segura <emi2fast@xxxxxxxxx> · Wed, 6 Jan 2016 11:42:17 +0100

Maybe you don't think you have a problem with your fencing, but
anyway, you can try to use group_tool dump when you have the problem,
In this way we can exclude the fencing and dlm waiting for fencing.

2016-01-06 11:08 GMT+01:00 B.Baransel BAĞCI <bagcib@xxxxxxxxxx>:
>
> Alinti Bob Peterson <rpeterso@xxxxxxxxxx>
>
>> ----- Original Message -----
>>>
>>> Hi list,
>>>
>>> I have some problems with GFS2 with failed nodes. After one of the
>>> cluster nodes fenced and rebooted, it cannot mount some of the gfs2
>>> file systems but hangs on the mount operation. No output. I've waited
>>> nearly 10 minutes to mount single disk but it didn't respond. Only
>>> solution is to shutdown all nodes and clean start of the cluster. I'm
>>> suspecting journal size or file system quotas.
>>>
>>> I have 8-node rhel-6 cluster with GFS2 formatted disks which are all
>>> mounted by all nodes.
>>> There are two types of disk:
>>>      Type A :
>>>          ~50 GB disk capacity
>>>          8 journal with size 512MB
>>>          block-size: 1024
>>>          very small files (Avg: 50 byte - sym.links)
>>>          ~500.000 file (inode)
>>>          Usage: 10%
>>>          Nearly no write IO (under 1000 file per day)
>>>          No user quota (quota=off)
>>>          Mount options: async,quota=off,nodiratime,noatime
>>>
>>>      Tybe B :
>>>          ~1 TB disk capacity
>>>          8 journal with size 512MB
>>>          block-size: 4096
>>>          relatively small files (Avg: 20 KB)
>>>          ~5.000.000 file (inode)
>>>          Usage: 20%
>>>          write IO ~50.000 file per day
>>>          user quota is on (some of the users exceeded quota)
>>>          Mount options: async,quota=on,nodiratime,noatime
>>>
>>> To improve performance, I set journal size to 512 MB instead of 128 MB
>>> default. All disk are connected with fiber from SAN Storage. All disk
>>> on cluster LVM. All nodes connected to each other with private
>>> Gb-switch.
>>>
>>> For example, after "node5" failed and fenced, it can re-enter the
>>> cluster. When i try "service gfs2 start", it can mount "Type A" disks,
>>> but hangs on the first "Tybe B" disk. Logs hangs on the "Trying to
>>> join cluster lock_dlm" message:
>>>
>>>      ...
>>>      Jan 05 00:01:52 node5 lvm[4090]: Found volume group "VG_of_TYPE_A"
>>>      Jan 05 00:01:52 node5 lvm[4119]: Activated 2 logical volumes in
>>> volume group VG_of_TYPE_A
>>>      Jan 05 00:01:52 node5 lvm[4119]: 2 logical volume(s) in volume
>>> group "VG_of_TYPE_A" now active
>>>      Jan 05 00:01:52 node5 lvm[4119]: Wiping internal VG cache
>>>      Jan 05 00:02:26 node5 kernel: Slow work thread pool: Starting up
>>>      Jan 05 00:02:26 node5 kernel: Slow work thread pool: Ready
>>>      Jan 05 00:02:26 node5 kernel: GFS2 (built Dec 12 2014 16:06:57)
>>>      installed
>>>      Jan 05 00:02:26 node5 kernel: GFS2: fsid=: Trying to join cluster
>>> "lock_dlm", "TESTCLS:typeA1"
>>>      Jan 05 00:02:26 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: Joined
>>> cluster. Now mounting FS...
>>>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5,
>>> already locked for use
>>>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5:
>>> Looking at journal...
>>>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5:
>>> Done
>>>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=: Trying to join cluster
>>> "lock_dlm", "TESTCLS:typeA2"
>>>      Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: Joined
>>> cluster. Now mounting FS...
>>>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5,
>>> already locked for use
>>>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5:
>>> Looking at journal...
>>>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5:
>>> Done
>>>      Jan 05 00:02:28 node5 kernel: GFS2: fsid=: Trying to join cluster
>>> "lock_dlm", "TESTCLS:typeB1"
>>>
>>>
>>> I've waited nearly 10 minutes in this state without respond or log. In
>>> this state, I cannot do `ls` in another nodes for this file system.
>>> Any idea of the cause of the problem? How is the cluster affected by
>>> journal size or count?
>>> --
>>> B.Baransel BAĞCI
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@xxxxxxxxxx
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> Hi,
>>
>> If mount hangs, it's hard to say what it's doing. It could be waiting
>> for a dlm lock, which is waiting on a pending fencing option.
>>
>> There have been occasional hangs discovered in journal replay, but not
>> for a long time. It's less likely. What kernel version is this?
>> December 12, 2014 is more than a year old, so it might be something
>> we've already found and fixed. If this is RHEL6 or Centos6 or similar,
>> you could try catting the /proc/<pid>/stack file of the mount helper
>> process, aka mount.gfs2 and see what it's doing.
>>
>
> OS is Rhel-6 and kernel is 2.6.32-504.3.3.el6.x86_64. Actually this problem
> began with the increasing of the data. At beginning with very low data and
> IO, this problem didn't exist. This cluster system is isolated and don't get
> updates. So, no kernel or no package change for a year. Also, I execute fsck
> on every disk after each crash.
> In next crash, i will look to stack file, but I cannot make system crash
> right now, because it's used by other services.
>
>> Normally, dlm recovery and gfs2 recovery take only a few seconds time.
>> The size of journals and number of journals will likely have no effect.
>> If I was a betting man, I'd bet that GFS2 is waiting for DLM, and
>> DLM is waiting for a fence operation to be completed successfully
>> before continuing. If this is rhel6 or earlier, you could do
>> "group_tool dump" to find out if the cluster membership is sane or
>> if it's waiting for something like this.
>
>
> Fence devices are working correctly. Cluster master calls fence device
> (ipmi) and failed node restarts. Then I can start "cman" and "clvmd"
> succesfully. Until this point there no problem. Also all other nodes are
> working correctly. Thus, I don't think there is a waiting fence operation.
> And this problem started when the data on the disk increased.
>
> My cluster conf like this:
>
>         <?xml version="1.0"?>
>         <cluster config_version="1" name="TESTCLS">
>                         <totem consensus="4000" token="2000"/>
>                         <cman cluster_id="1234" expected_votes="1">
>                                         <multicast addr="233.41.51.61"/>
>                         </cman>
>                         <clusternodes>
>                                         <clusternode name="node1.TESTCLS"
> nodeid="5">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.36" port="ilo2" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node2.TESTCLS"
> nodeid="11">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.41" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node3.TESTCLS"
> nodeid="12">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.49" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node4.TESTCLS"
> nodeid="21">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.82" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node5.TESTCLS"
> nodeid="22">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.79" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node6.TESTCLS"
> nodeid="23">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.81" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node7.TESTCLS"
> nodeid="31">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.78" port="ipmi" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                                         <clusternode name="node8.TESTCLS"
> nodeid="32">
>                                                         <fence>
>
> <method name="CUSTOM">
>
> <device ipaddr="192.168.0.11" port="ilo" name="custom_fence"/>
>
> </method>
>                                                         </fence>
>                                         </clusternode>
>                         </clusternodes>
>                         <fence_daemon post_fail_delay="0"
> post_join_delay="10"/>
>                         <fencedevices>
>                                         <fencedevice agent="fence_script"
> name="custom_fence"/>
>                         </fencedevices>
>                         <rm>
>                                         <failoverdomains>
>                                                         <failoverdomain
> name="NFS" ordered="1" restricted="1">
>
> <failoverdomainnode name="node7.TESTCLS" priority="10"/>
>
> <failoverdomainnode name="node8.TESTCLS" priority="20"/>
>                                                         </failoverdomain>
>                                         </failoverdomains>
>                                         <resources>
>                                                         <ip
> address="192.168.1.34/24" sleeptime="10"/>
>                                                         <nfsserver
> name="NFS_service_resource"/>
>                                         </resources>
>                                         <service autostart="1" domain="NFS"
> exclusive="1" name="NFS_service" recovery="relocate">
>                                                         <ip
> ref="192.168.1.34/24">
>
> <nfsserver ref="NFS_service_resource"/>
>                                                         </ip>
>                                         </service>
>                         </rm>
>                         <dlm plock_ownership="1" plock_rate_limit="0"/>
>                         <gfs_controld plock_rate_limit="0"/>
>         </cluster>
>
> Note: Fence agent is a script and works correctly. It calls ipmi or ilo
> fence agents and returns 0 after success.
>
>
> thanks
> --
> B.Baransel BAĞCI
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster