Re: gfs2 mount hangs

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Fri, 14 Aug 2009 10:30:17 +0800

Hi Dave,

Thanks for your rely!

I changed the cluster.conf as you suggested on both nodes. and 
re-formatted the block device by
'mkfs.gfs2 -t testgfs2:1 -j 2 /dev/sdb'.
the following are the info needed.

I) user space utils version:
[root@cool ~]# rpm -qi cman
Name        : cman                         Relocations: (not relocatable)
Version     : 3.0.0                             Vendor: Fedora Project
Release     : 20.fc11                       Build Date: Wed 29 Jul 2009 
01:27:54 AM CST
Install Date: Tue 04 Aug 2009 05:43:32 PM CST      Build Host: 
xenbuilder4.fedora.phx.redhat.com
Group       : System Environment/Base       Source RPM: 
cluster-3.0.0-20.fc11.src.rpm
Size        : 1146269                          License: GPLv2+ and LGPLv2+
Signature   : RSA/8, Wed 29 Jul 2009 08:35:33 PM CST, Key ID 
1dc5c758d22e77f2
Packager    : Fedora Project
URL         : http://sources.redhat.com/cluster/wiki/
Summary     : Red Hat Cluster Manager
Description :
Red Hat Cluster Manager

[root@cool ~]# rpm -qi gfs2-utils
Name        : gfs2-utils                   Relocations: (not relocatable)
Version     : 3.0.0                             Vendor: Fedora Project
Release     : 20.fc11                       Build Date: Wed 29 Jul 2009 
01:27:54 AM CST
Install Date: Tue 04 Aug 2009 05:46:17 PM CST      Build Host: 
xenbuilder4.fedora.phx.redhat.com
Group       : System Environment/Kernel     Source RPM: 
cluster-3.0.0-20.fc11.src.rpm
Size        : 682088                           License: GPLv2+ and LGPLv2+
Signature   : RSA/8, Wed 29 Jul 2009 08:36:42 PM CST, Key ID 
1dc5c758d22e77f2
Packager    : Fedora Project
URL         : http://sources.redhat.com/cluster/wiki/
Summary     : Utilities for managing the global filesystem (GFS2)
Description :
The gfs2-utils package contains a number of utilities for creating,
checking, modifying, and correcting any inconsistencies in GFS2
filesystems

anything else needed?

II) status outputs before and after mounts
'cman_tool status' before the mounts:
[root@desk ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104

#don't know where 239.192.110.55 comes from. does it matter?

[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101

'cman_tool nodes' before the mounts:
[root@desk ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     88   2009-08-14 17:44:00  cool
   2   M     76   2009-08-14 17:43:52  desk

[root@cool ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     84   2009-08-14 09:46:06  cool
   2   M     88   2009-08-14 09:46:06  desk

'group_tool' before the mounts:
[root@desk ~]# group_tool
groupd not running
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

[root@cool ~]# group_tool
groupd not running
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2
----

'cman_tool status' after the first mount(on node desk):
[root@desk ~]#  cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104

[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101

'cman_tool nodes' after the first mount:
[root@desk ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     88   2009-08-14 17:44:00  cool
   2   M     76   2009-08-14 17:43:52  desk

[root@cool ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     84   2009-08-14 09:46:06  cool
   2   M     88   2009-08-14 09:46:06  desk

'group_tool' after the first mount:
[root@desk ~]# group_tool
groupd not running
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

dlm lockspaces
name          1
id            0xe5ab8ad6
flags         0x00000000
change        member 1 joined 1 remove 0 failed 0 seq 1,1
members       2

[root@cool ~]# group_tool
groupd not running
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2
---

'cman_tool status' after the second mount(no node cool, hang)
[root@desk ~]#  cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104

[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101

'cman_tool nodes' after the second mount
[root@desk ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     88   2009-08-14 17:44:00  cool
   2   M     76   2009-08-14 17:43:52  desk

[root@cool ~]#  cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M     84   2009-08-14 09:46:06  cool
   2   M     88   2009-08-14 09:46:06  desk

'group_tool' after the second mount
[root@desk ~]# group_tool
groupd not running
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

dlm lockspaces
name          1
id            0xe5ab8ad6
flags         0x00000000
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

[root@cool ~]# group_tool
groupd compatibility mode 0
fence domain
member count  2
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2

dlm lockspaces
name          1
id            0xe5ab8ad6
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

gfs mountgroups
name          1
id            0x791ee743
flags         0x00000020 need_first
change        member 1 joined 1 remove 0 failed 0 seq 1,1
members       1
---

#checking for difference, seems only the group_tool has different 
output. problem is in groupd? it starts automatically? I didn't start it 
by hand. what I do is "service cman start" on both nodes and then "mount 
...." on both nodes.

messages files shows:
node desk:
Aug 14 18:07:44 desk kernel: GFS2: fsid=: Trying to join cluster 
"lock_dlm", "testgfs2:1"
Aug 14 18:07:44 desk kernel: dlm: Using TCP for communications
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: Joined cluster. 
Now mounting FS...
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0, already 
locked for use
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0: Looking at 
journal...
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0: Done
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Trying to 
acquire journal lock...
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Looking at 
journal...
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Done
Aug 14 18:07:54 desk kernel: dlm: connecting to 1

node cool:
Aug 14 10:10:00 cool kernel: GFS2: fsid=: Trying to join cluster 
"lock_dlm", "testgfs2:1"
Aug 14 10:10:00 cool kernel: dlm: Using TCP for communications
Aug 14 10:10:00 cool kernel: dlm: got connection from 2
Aug 14 10:10:00 cool kernel: GFS2: fsid=testgfs2:1.0: Joined cluster. 
Now mounting FS...
Aug 14 10:14:00 cool kernel: INFO: task mount.gfs2:2458 blocked for more 
than 120 seconds.
Aug 14 10:14:00 cool kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 10:14:00 cool kernel: mount.gfs2    D 000001db     0  2458   2457 
0x00000080
Aug 14 10:14:00 cool kernel: dbd13c6c 00000086 31dc2c4d 000001db 
f6575c44 c09a5260 c09a9b60 f6575c44
Aug 14 10:14:00 cool kernel: dbd13c3c c09a9b60 c09a9b60 f4c3b6c0 
dbcf4000 dbd13c58 dbd13c44 00000001
Aug 14 10:14:00 cool kernel: 31dbe4d2 000001db f65759b0 dbd13c5c 
dbd13c5c 00000246 c1f98554 c1f98558
Aug 14 10:14:00 cool kernel: Call Trace:
Aug 14 10:14:00 cool kernel: [<f8d5a6e6>] 
gfs2_glock_holder_wait+0xd/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<c0742897>] __wait_on_bit+0x39/0x60
Aug 14 10:14:00 cool kernel: [<f8d5a6d9>] ? 
gfs2_glock_holder_wait+0x0/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5a6d9>] ? 
gfs2_glock_holder_wait+0x0/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<c074295e>] out_of_line_wait_on_bit+0xa0/0xa8
Aug 14 10:14:00 cool kernel: [<c044779d>] ? wake_bit_function+0x0/0x3c
Aug 14 10:14:00 cool kernel: [<f8d5d487>] wait_on_bit.clone.1+0x1c/0x28 
[gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d4fe>] gfs2_glock_wait+0x31/0x37 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d774>] gfs2_glock_nq+0x270/0x278 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d8a0>] gfs2_glock_nq_num+0x4c/0x6c [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d66567>] init_journal+0x2b2/0x675 [gfs2]
Aug 14 10:14:00 cool kernel: [<c04b1e75>] ? __slab_alloc+0x40a/0x421
Aug 14 10:14:00 cool kernel: [<c04b21b8>] ? kmem_cache_alloc+0x6d/0x105
Aug 14 10:14:00 cool kernel: [<c04c6fe1>] ? d_alloc+0x23/0x15e
Aug 14 10:14:00 cool kernel: [<c04dd3a0>] ? inotify_d_instantiate+0x17/0x3a
Aug 14 10:14:00 cool kernel: [<f8d65e5f>] ? gfs2_glock_nq_init+0x13/0x31 
[gfs2]
Aug 14 10:14:00 cool kernel: [<f8d6694f>] init_inodes+0x25/0x152 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d6750d>] fill_super+0xa91/0xc10 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d899>] ? gfs2_glock_nq_num+0x45/0x6c 
[gfs2]
Aug 14 10:14:00 cool kernel: [<c04bade0>] get_sb_bdev+0xdc/0x119
Aug 14 10:14:00 cool kernel: [<c04b4aea>] ? pcpu_alloc+0x352/0x38b
Aug 14 10:14:00 cool kernel: [<f8d65ad8>] gfs2_get_sb+0x18/0x1a [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d66a7c>] ? fill_super+0x0/0xc10 [gfs2]
Aug 14 10:14:00 cool kernel: [<c04baacc>] vfs_kern_mount+0x82/0xf0
Aug 14 10:14:00 cool kernel: [<c04bab89>] do_kern_mount+0x38/0xc3
Aug 14 10:14:00 cool kernel: [<c04cc03f>] do_mount+0x68c/0x6e4
Aug 14 10:14:00 cool kernel: [<c0492dc1>] ? __get_free_pages+0x24/0x26
Aug 14 10:14:00 cool kernel: [<c04cc0fd>] sys_mount+0x66/0x98
Aug 14 10:14:00 cool kernel: [<c0402a28>] sysenter_do_call+0x12/0x27

there will be more like the "INFO: task mount.gfs2:2458 blocked for more 
than 120 seconds." if I wait longer.

regards,
wengang.

David Teigland wrote:
On Thu, Aug 13, 2009 at 02:22:11PM +0800, Wengang Wang wrote:
<cman two_node="1" expected_votes="2"/>

That's not a valid combination, two_node="1" requires expected_votes="1".

You didn't mention which userspace cluster version/release you're using, or
include any status about the cluster.  Before trying to mount gfs on either
node, collect from both nodes:

  cman_tool status
  cman_tool nodes
  group_tool

Then mount on the first node and collect the same information, then try
mounting on the second node, collect the same information, and look for any
errors in /var/log/messages.

Since you're using new kernels, you need to be using the cluster 3.0 userspace
code.  You're using the old manual fencing config.  There is no more
fence_manual; the new way to configure manual fencing is to not configure any
fencing at all.  So, your cluster.conf should look like this:

<?xml version="1.0"?>
<cluster name="testgfs2" config_version="1">
<cman two_node="1" expected_votes="1"/>
<clusternodes>
<clusternode name="cool" nodeid="1"/>
<clusternode name="desk" nodeid="2"/>
</clusternodes>
</cluster>

Dave

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster