----- Original Message ----- > hi ,guys > I have a two-nodes GFS2 cluster based on logic volume created by drbd block > device /dev/drbd0. The two nodes' mount points of GFS2 filesystem are > exported by samba share. Then there are two clients mounting and copying > data into them respectively. Hours later, one client(assume just call it > clientA) has finished all tasks, while the other client(assume just call it > clientB) is still copying with very slow write speed(2-3MB/s, in normal case > 40-100MB/s). > Then I doubt that the there is something wrong with gfs2 filesystem on the > corresponding server node that clientB mount to, and I try to write some > data into it by > excute commad as follows: > [root@dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000 > 1000+0 records in > 1000+0 records out > 131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s > It shows the write speed is too slow, almostly hangs up. I redo it once > again, it hangs up. Then, I terminate it with 『Ctr + c』, and kernel reports > error messages as > follows: > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid > metadata block > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: bh = 25 (magic > number) > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: function = > gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to > acquire journal lock... > Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted > 2.6.32-358.el6.x86_64 #1 > Nov 12 11:50:11 dcs-229 kernel: Call Trace: > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ? > gfs2_lm_withdraw+0x102/0x130 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ? > wake_bit_function+0x0/0x50 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ? > gfs2_meta_check_ii+0x45/0x50 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ? > gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ? > perf_event_task_sched_out+0x33/0x80 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ? > gfs2_inode_refresh+0x25/0x2c0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ? > inode_go_lock+0x88/0xf0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330 > [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ? > finish_xmote+0x178/0x410 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ? > glock_work_func+0x133/0x1d0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ? > glock_work_func+0x0/0x1d0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ? > worker_thread+0x170/0x2a0 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ? > autoremove_wake_function+0x0/0x40 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ? > worker_thread+0x0/0x2a0 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0 > Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed > And the other node also reports error messages: > Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted > 2.6.32-358.el6.x86_64 #1 > Nov 12 11:48:50 dcs-226 kernel: Call Trace: > Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ? > gfs2_lm_withdraw+0x102/0x130 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ? > wake_bit_function+0x0/0x50 > Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ? > gfs2_meta_check_ii+0x45/0x50 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ? > gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ? > perf_event_task_sched_out+0x33/0x80 > Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ? > gfs2_inode_refresh+0x25/0x2c0 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ? > inode_go_lock+0x88/0xf0 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid > metadata block > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: bh = 66213 > (magic number) > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: function = > gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw > this file system > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to > unmount > Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330 > [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ? > finish_xmote+0x178/0x410 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ? > glock_work_func+0x133/0x1d0 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ? > glock_work_func+0x0/0x1d0 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ? > worker_thread+0x170/0x2a0 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ? > autoremove_wake_function+0x0/0x40 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ? > worker_thread+0x0/0x2a0 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0 > Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 > After this, mount points has crashed. what should i do? Anyone could help me? Hi, I recommend you open a support case with Red Hat. If you're not a Red Hat customer, you can open a bugzilla record, save off the metadata for that file system (with gfs2_edit savemeta) and post a link to it in the bugzilla. The hang and the assert should not happen. Regards, Bob Peterson Red Hat File Systems -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster