Re: GFS2 mount hangs for some disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Alinti Bob Peterson <rpeterso@xxxxxxxxxx>

----- Original Message -----
Hi list,

I have some problems with GFS2 with failed nodes. After one of the
cluster nodes fenced and rebooted, it cannot mount some of the gfs2
file systems but hangs on the mount operation. No output. I've waited
nearly 10 minutes to mount single disk but it didn't respond. Only
solution is to shutdown all nodes and clean start of the cluster. I'm
suspecting journal size or file system quotas.

I have 8-node rhel-6 cluster with GFS2 formatted disks which are all
mounted by all nodes.
There are two types of disk:
     Type A :
         ~50 GB disk capacity
         8 journal with size 512MB
         block-size: 1024
         very small files (Avg: 50 byte - sym.links)
         ~500.000 file (inode)
         Usage: 10%
         Nearly no write IO (under 1000 file per day)
         No user quota (quota=off)
         Mount options: async,quota=off,nodiratime,noatime

     Tybe B :
         ~1 TB disk capacity
         8 journal with size 512MB
         block-size: 4096
         relatively small files (Avg: 20 KB)
         ~5.000.000 file (inode)
         Usage: 20%
         write IO ~50.000 file per day
         user quota is on (some of the users exceeded quota)
         Mount options: async,quota=on,nodiratime,noatime

To improve performance, I set journal size to 512 MB instead of 128 MB
default. All disk are connected with fiber from SAN Storage. All disk
on cluster LVM. All nodes connected to each other with private
Gb-switch.

For example, after "node5" failed and fenced, it can re-enter the
cluster. When i try "service gfs2 start", it can mount "Type A" disks,
but hangs on the first "Tybe B" disk. Logs hangs on the "Trying to
join cluster lock_dlm" message:

     ...
     Jan 05 00:01:52 node5 lvm[4090]: Found volume group "VG_of_TYPE_A"
     Jan 05 00:01:52 node5 lvm[4119]: Activated 2 logical volumes in
volume group VG_of_TYPE_A
     Jan 05 00:01:52 node5 lvm[4119]: 2 logical volume(s) in volume
group "VG_of_TYPE_A" now active
     Jan 05 00:01:52 node5 lvm[4119]: Wiping internal VG cache
     Jan 05 00:02:26 node5 kernel: Slow work thread pool: Starting up
     Jan 05 00:02:26 node5 kernel: Slow work thread pool: Ready
     Jan 05 00:02:26 node5 kernel: GFS2 (built Dec 12 2014 16:06:57)
     installed
     Jan 05 00:02:26 node5 kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "TESTCLS:typeA1"
     Jan 05 00:02:26 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: Joined
cluster. Now mounting FS...
     Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5,
already locked for use
     Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5:
Looking at journal...
     Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA1.5: jid=5: Done
     Jan 05 00:02:27 node5 kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "TESTCLS:typeA2"
     Jan 05 00:02:27 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: Joined
cluster. Now mounting FS...
     Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5,
already locked for use
     Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5:
Looking at journal...
     Jan 05 00:02:28 node5 kernel: GFS2: fsid=TESTCLS:typeA2.5: jid=5: Done
     Jan 05 00:02:28 node5 kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "TESTCLS:typeB1"


I've waited nearly 10 minutes in this state without respond or log. In
this state, I cannot do `ls` in another nodes for this file system.
Any idea of the cause of the problem? How is the cluster affected by
journal size or count?
--
B.Baransel BAĞCI

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

Hi,

If mount hangs, it's hard to say what it's doing. It could be waiting
for a dlm lock, which is waiting on a pending fencing option.

There have been occasional hangs discovered in journal replay, but not
for a long time. It's less likely. What kernel version is this?
December 12, 2014 is more than a year old, so it might be something
we've already found and fixed. If this is RHEL6 or Centos6 or similar,
you could try catting the /proc/<pid>/stack file of the mount helper
process, aka mount.gfs2 and see what it's doing.


OS is Rhel-6 and kernel is 2.6.32-504.3.3.el6.x86_64. Actually this problem began with the increasing of the data. At beginning with very low data and IO, this problem didn't exist. This cluster system is isolated and don't get updates. So, no kernel or no package change for a year. Also, I execute fsck on every disk after each crash. In next crash, i will look to stack file, but I cannot make system crash right now, because it's used by other services.

Normally, dlm recovery and gfs2 recovery take only a few seconds time.
The size of journals and number of journals will likely have no effect.
If I was a betting man, I'd bet that GFS2 is waiting for DLM, and
DLM is waiting for a fence operation to be completed successfully
before continuing. If this is rhel6 or earlier, you could do
"group_tool dump" to find out if the cluster membership is sane or
if it's waiting for something like this.

Fence devices are working correctly. Cluster master calls fence device (ipmi) and failed node restarts. Then I can start "cman" and "clvmd" succesfully. Until this point there no problem. Also all other nodes are working correctly. Thus, I don't think there is a waiting fence operation. And this problem started when the data on the disk increased.

My cluster conf like this:

	<?xml version="1.0"?>
	<cluster config_version="1" name="TESTCLS">
			<totem consensus="4000" token="2000"/>
			<cman cluster_id="1234" expected_votes="1">
					<multicast addr="233.41.51.61"/>
			</cman>
			<clusternodes>
					<clusternode name="node1.TESTCLS" nodeid="5">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.36" port="ilo2" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node2.TESTCLS" nodeid="11">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.41" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node3.TESTCLS" nodeid="12">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.49" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node4.TESTCLS" nodeid="21">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.82" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node5.TESTCLS" nodeid="22">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.79" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node6.TESTCLS" nodeid="23">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.81" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node7.TESTCLS" nodeid="31">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.78" port="ipmi" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
					<clusternode name="node8.TESTCLS" nodeid="32">
							<fence>
									<method name="CUSTOM">
											<device ipaddr="192.168.0.11" port="ilo" name="custom_fence"/>
									</method>
							</fence>
					</clusternode>
			</clusternodes>
			<fence_daemon post_fail_delay="0" post_join_delay="10"/>
			<fencedevices>
					<fencedevice agent="fence_script" name="custom_fence"/>
			</fencedevices>
			<rm>
					<failoverdomains>
							<failoverdomain name="NFS" ordered="1" restricted="1">
									<failoverdomainnode name="node7.TESTCLS" priority="10"/>
									<failoverdomainnode name="node8.TESTCLS" priority="20"/>
							</failoverdomain>
					</failoverdomains>
					<resources>
							<ip address="192.168.1.34/24" sleeptime="10"/>
							<nfsserver name="NFS_service_resource"/>
					</resources>
<service autostart="1" domain="NFS" exclusive="1" name="NFS_service" recovery="relocate">
							<ip ref="192.168.1.34/24">
									<nfsserver ref="NFS_service_resource"/>
							</ip>
					</service>
			</rm>
			<dlm plock_ownership="1" plock_rate_limit="0"/>
			<gfs_controld plock_rate_limit="0"/>
	</cluster>

Note: Fence agent is a script and works correctly. It calls ipmi or ilo fence agents and returns 0 after success.


thanks
--
B.Baransel BAĞCI

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux