Re: GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block

Bob Peterson <rpeterso@xxxxxxxxxx> · Wed, 12 Nov 2014 08:20:38 -0500 (EST)

----- Original Message -----
> hi ,guys
> I have a two-nodes GFS2 cluster based on  logic volume created by drbd block
> device /dev/drbd0. The two nodes' mount points of  GFS2 filesystem are
> exported by samba share. Then there are two clients mounting and copying
> data into them respectively. Hours later, one client(assume just call it
> clientA) has finished all tasks, while the other client(assume just call it
> clientB) is still copying with very slow write speed(2-3MB/s, in normal case
> 40-100MB/s).
> Then I doubt that the there is something wrong with gfs2 filesystem on the
> corresponding server node that clientB mount to, and I try to write some
> data into it by
> excute commad as follows:
> [root@dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000
> 1000+0 records in
> 1000+0 records out
> 131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s
> It shows the write speed is too slow,  almostly hangs up. I redo it once
> again, it hangs up. Then, I terminate it with 『Ctr + c』, and kernel reports
> error messages as
> follows:
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid
> metadata block
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   bh = 25 (magic
> number)
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   function =
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to
> acquire journal lock...
> Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted
> 2.6.32-358.el6.x86_64 #1
> Nov 12 11:50:11 dcs-229 kernel: Call Trace:
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ?
> gfs2_lm_withdraw+0x102/0x130 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ?
> wake_bit_function+0x0/0x50
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ?
> gfs2_meta_check_ii+0x45/0x50 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ?
> gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ?
> perf_event_task_sched_out+0x33/0x80
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ?
> gfs2_inode_refresh+0x25/0x2c0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ?
> inode_go_lock+0x88/0xf0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330
> [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ?
> finish_xmote+0x178/0x410 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ?
> glock_work_func+0x133/0x1d0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ?
> glock_work_func+0x0/0x1d0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ?
> worker_thread+0x170/0x2a0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ?
> autoremove_wake_function+0x0/0x40
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ?
> worker_thread+0x0/0x2a0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed
> And the other node also reports error messages:
> Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted
> 2.6.32-358.el6.x86_64 #1
> Nov 12 11:48:50 dcs-226 kernel: Call Trace:
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ?
> gfs2_lm_withdraw+0x102/0x130 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ?
> wake_bit_function+0x0/0x50
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ?
> gfs2_meta_check_ii+0x45/0x50 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ?
> gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ?
> perf_event_task_sched_out+0x33/0x80
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ?
> gfs2_inode_refresh+0x25/0x2c0 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ?
> inode_go_lock+0x88/0xf0 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid
> metadata block
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   bh = 66213
> (magic number)
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   function =
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw
> this file system
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to
> unmount
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330
> [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ?
> finish_xmote+0x178/0x410 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ?
> glock_work_func+0x133/0x1d0 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ?
> glock_work_func+0x0/0x1d0 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ?
> worker_thread+0x170/0x2a0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ?
> autoremove_wake_function+0x0/0x40
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ?
> worker_thread+0x0/0x2a0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> After this, mount points has crashed. what should i do? Anyone could help me?

Hi,

I recommend you open a support case with Red Hat. If you're not a Red Hat
customer, you can open a bugzilla record, save off the metadata for that
file system (with gfs2_edit savemeta) and post a link to it in the bugzilla.
The hang and the assert should not happen. 

Regards,

Bob Peterson
Red Hat File Systems

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster