Hi Dave,
Thanks for your rely!
I changed the cluster.conf as you suggested on both nodes. and
re-formatted the block device by
'mkfs.gfs2 -t testgfs2:1 -j 2 /dev/sdb'.
the following are the info needed.
I) user space utils version:
[root@cool ~]# rpm -qi cman
Name : cman Relocations: (not relocatable)
Version : 3.0.0 Vendor: Fedora Project
Release : 20.fc11 Build Date: Wed 29 Jul 2009
01:27:54 AM CST
Install Date: Tue 04 Aug 2009 05:43:32 PM CST Build Host:
xenbuilder4.fedora.phx.redhat.com
Group : System Environment/Base Source RPM:
cluster-3.0.0-20.fc11.src.rpm
Size : 1146269 License: GPLv2+ and LGPLv2+
Signature : RSA/8, Wed 29 Jul 2009 08:35:33 PM CST, Key ID
1dc5c758d22e77f2
Packager : Fedora Project
URL : http://sources.redhat.com/cluster/wiki/
Summary : Red Hat Cluster Manager
Description :
Red Hat Cluster Manager
[root@cool ~]# rpm -qi gfs2-utils
Name : gfs2-utils Relocations: (not relocatable)
Version : 3.0.0 Vendor: Fedora Project
Release : 20.fc11 Build Date: Wed 29 Jul 2009
01:27:54 AM CST
Install Date: Tue 04 Aug 2009 05:46:17 PM CST Build Host:
xenbuilder4.fedora.phx.redhat.com
Group : System Environment/Kernel Source RPM:
cluster-3.0.0-20.fc11.src.rpm
Size : 682088 License: GPLv2+ and LGPLv2+
Signature : RSA/8, Wed 29 Jul 2009 08:36:42 PM CST, Key ID
1dc5c758d22e77f2
Packager : Fedora Project
URL : http://sources.redhat.com/cluster/wiki/
Summary : Utilities for managing the global filesystem (GFS2)
Description :
The gfs2-utils package contains a number of utilities for creating,
checking, modifying, and correcting any inconsistencies in GFS2
filesystems
anything else needed?
II) status outputs before and after mounts
'cman_tool status' before the mounts:
[root@desk ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104
#don't know where 239.192.110.55 comes from. does it matter?
[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101
'cman_tool nodes' before the mounts:
[root@desk ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 88 2009-08-14 17:44:00 cool
2 M 76 2009-08-14 17:43:52 desk
[root@cool ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 84 2009-08-14 09:46:06 cool
2 M 88 2009-08-14 09:46:06 desk
'group_tool' before the mounts:
[root@desk ~]# group_tool
groupd not running
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
[root@cool ~]# group_tool
groupd not running
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
----
'cman_tool status' after the first mount(on node desk):
[root@desk ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104
[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101
'cman_tool nodes' after the first mount:
[root@desk ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 88 2009-08-14 17:44:00 cool
2 M 76 2009-08-14 17:43:52 desk
[root@cool ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 84 2009-08-14 09:46:06 cool
2 M 88 2009-08-14 09:46:06 desk
'group_tool' after the first mount:
[root@desk ~]# group_tool
groupd not running
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
dlm lockspaces
name 1
id 0xe5ab8ad6
flags 0x00000000
change member 1 joined 1 remove 0 failed 0 seq 1,1
members 2
[root@cool ~]# group_tool
groupd not running
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
---
'cman_tool status' after the second mount(no node cool, hang)
[root@desk ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: desk
Node ID: 2
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.104
[root@cool ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: testgfs2
Cluster Id: 28360
Cluster Member: Yes
Cluster Generation: 88
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node HaveState
Ports Bound: 0
Node name: cool
Node ID: 1
Multicast addresses: 239.192.110.55
Node addresses: 192.168.1.101
'cman_tool nodes' after the second mount
[root@desk ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 88 2009-08-14 17:44:00 cool
2 M 76 2009-08-14 17:43:52 desk
[root@cool ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 84 2009-08-14 09:46:06 cool
2 M 88 2009-08-14 09:46:06 desk
'group_tool' after the second mount
[root@desk ~]# group_tool
groupd not running
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
dlm lockspaces
name 1
id 0xe5ab8ad6
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 2,2
members 1 2
[root@cool ~]# group_tool
groupd compatibility mode 0
fence domain
member count 2
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2
dlm lockspaces
name 1
id 0xe5ab8ad6
flags 0x00000008 fs_reg
change member 2 joined 1 remove 0 failed 0 seq 1,1
members 1 2
gfs mountgroups
name 1
id 0x791ee743
flags 0x00000020 need_first
change member 1 joined 1 remove 0 failed 0 seq 1,1
members 1
---
#checking for difference, seems only the group_tool has different
output. problem is in groupd? it starts automatically? I didn't start it
by hand. what I do is "service cman start" on both nodes and then "mount
...." on both nodes.
messages files shows:
node desk:
Aug 14 18:07:44 desk kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "testgfs2:1"
Aug 14 18:07:44 desk kernel: dlm: Using TCP for communications
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: Joined cluster.
Now mounting FS...
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0, already
locked for use
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0: Looking at
journal...
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=0: Done
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Trying to
acquire journal lock...
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Looking at
journal...
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk gfs_controld[2206]: recovery_uevent mg not found 1
Aug 14 18:07:44 desk kernel: GFS2: fsid=testgfs2:1.0: jid=1: Done
Aug 14 18:07:54 desk kernel: dlm: connecting to 1
node cool:
Aug 14 10:10:00 cool kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "testgfs2:1"
Aug 14 10:10:00 cool kernel: dlm: Using TCP for communications
Aug 14 10:10:00 cool kernel: dlm: got connection from 2
Aug 14 10:10:00 cool kernel: GFS2: fsid=testgfs2:1.0: Joined cluster.
Now mounting FS...
Aug 14 10:14:00 cool kernel: INFO: task mount.gfs2:2458 blocked for more
than 120 seconds.
Aug 14 10:14:00 cool kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 14 10:14:00 cool kernel: mount.gfs2 D 000001db 0 2458 2457
0x00000080
Aug 14 10:14:00 cool kernel: dbd13c6c 00000086 31dc2c4d 000001db
f6575c44 c09a5260 c09a9b60 f6575c44
Aug 14 10:14:00 cool kernel: dbd13c3c c09a9b60 c09a9b60 f4c3b6c0
dbcf4000 dbd13c58 dbd13c44 00000001
Aug 14 10:14:00 cool kernel: 31dbe4d2 000001db f65759b0 dbd13c5c
dbd13c5c 00000246 c1f98554 c1f98558
Aug 14 10:14:00 cool kernel: Call Trace:
Aug 14 10:14:00 cool kernel: [<f8d5a6e6>]
gfs2_glock_holder_wait+0xd/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<c0742897>] __wait_on_bit+0x39/0x60
Aug 14 10:14:00 cool kernel: [<f8d5a6d9>] ?
gfs2_glock_holder_wait+0x0/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5a6d9>] ?
gfs2_glock_holder_wait+0x0/0x11 [gfs2]
Aug 14 10:14:00 cool kernel: [<c074295e>] out_of_line_wait_on_bit+0xa0/0xa8
Aug 14 10:14:00 cool kernel: [<c044779d>] ? wake_bit_function+0x0/0x3c
Aug 14 10:14:00 cool kernel: [<f8d5d487>] wait_on_bit.clone.1+0x1c/0x28
[gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d4fe>] gfs2_glock_wait+0x31/0x37 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d774>] gfs2_glock_nq+0x270/0x278 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d8a0>] gfs2_glock_nq_num+0x4c/0x6c [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d66567>] init_journal+0x2b2/0x675 [gfs2]
Aug 14 10:14:00 cool kernel: [<c04b1e75>] ? __slab_alloc+0x40a/0x421
Aug 14 10:14:00 cool kernel: [<c04b21b8>] ? kmem_cache_alloc+0x6d/0x105
Aug 14 10:14:00 cool kernel: [<c04c6fe1>] ? d_alloc+0x23/0x15e
Aug 14 10:14:00 cool kernel: [<c04dd3a0>] ? inotify_d_instantiate+0x17/0x3a
Aug 14 10:14:00 cool kernel: [<f8d65e5f>] ? gfs2_glock_nq_init+0x13/0x31
[gfs2]
Aug 14 10:14:00 cool kernel: [<f8d6694f>] init_inodes+0x25/0x152 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d6750d>] fill_super+0xa91/0xc10 [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d5d899>] ? gfs2_glock_nq_num+0x45/0x6c
[gfs2]
Aug 14 10:14:00 cool kernel: [<c04bade0>] get_sb_bdev+0xdc/0x119
Aug 14 10:14:00 cool kernel: [<c04b4aea>] ? pcpu_alloc+0x352/0x38b
Aug 14 10:14:00 cool kernel: [<f8d65ad8>] gfs2_get_sb+0x18/0x1a [gfs2]
Aug 14 10:14:00 cool kernel: [<f8d66a7c>] ? fill_super+0x0/0xc10 [gfs2]
Aug 14 10:14:00 cool kernel: [<c04baacc>] vfs_kern_mount+0x82/0xf0
Aug 14 10:14:00 cool kernel: [<c04bab89>] do_kern_mount+0x38/0xc3
Aug 14 10:14:00 cool kernel: [<c04cc03f>] do_mount+0x68c/0x6e4
Aug 14 10:14:00 cool kernel: [<c0492dc1>] ? __get_free_pages+0x24/0x26
Aug 14 10:14:00 cool kernel: [<c04cc0fd>] sys_mount+0x66/0x98
Aug 14 10:14:00 cool kernel: [<c0402a28>] sysenter_do_call+0x12/0x27
there will be more like the "INFO: task mount.gfs2:2458 blocked for more
than 120 seconds." if I wait longer.
regards,
wengang.
David Teigland wrote:
On Thu, Aug 13, 2009 at 02:22:11PM +0800, Wengang Wang wrote:
<cman two_node="1" expected_votes="2"/>
That's not a valid combination, two_node="1" requires expected_votes="1".
You didn't mention which userspace cluster version/release you're using, or
include any status about the cluster. Before trying to mount gfs on either
node, collect from both nodes:
cman_tool status
cman_tool nodes
group_tool
Then mount on the first node and collect the same information, then try
mounting on the second node, collect the same information, and look for any
errors in /var/log/messages.
Since you're using new kernels, you need to be using the cluster 3.0 userspace
code. You're using the old manual fencing config. There is no more
fence_manual; the new way to configure manual fencing is to not configure any
fencing at all. So, your cluster.conf should look like this:
<?xml version="1.0"?>
<cluster name="testgfs2" config_version="1">
<cman two_node="1" expected_votes="1"/>
<clusternodes>
<clusternode name="cool" nodeid="1"/>
<clusternode name="desk" nodeid="2"/>
</clusternodes>
</cluster>
Dave
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster