Hi, On Sat, 2009-09-19 at 05:16 -0600, Kai Meyer wrote: > I have a 5 node cluster running kernel 2.6.18-128.1.6.el5xen and > gfs2-utils-0.1.53-1.el5_3.3 . Twice in 10 days, each node in my cluster > has failed with the same message in /var/log/messages. dmesg reports the > same errors, and on some nodes there are no other entries previous to > the invalid metadata block error. > > I would like to know what issues can trigger such an event. If it is > more helpful for me to provide more information, I will be happy to, I'm > just not sure what other information you would consider relevant. > > Thank you for your time, > -Kai Meyer > It means that the kernel was looking for an indirect block, but instead found something that was not an indirect block. The only way to fix this is with fsck (after unmounting on all nodes) otherwise the issue is likely to continue to occur each time you access the particular inode with the problem. There have been a couple of reports of this (or very similar) issues recently. The problem in each case is that the original issue probably happened some time before it triggered the message which you've seen. That means that it is very tricky to figure out exactly what the cause is. I'd be very interested to know whether this filesystem was a newly created gfs2 filesystem or an upgraded gfs1 filesystem. Also, whether there have been any other issues, however minor, which might have caused a node to be rebooted or fenced since the filesystem was created? Also, any other background information about the type of workload that was being run on the filesystem would be helpful too. Steve. > Sep 19 02:02:06 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: fatal: invalid metadata block > Sep 19 02:02:06 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: bh = 567447963 (magic number) > Sep 19 02:02:06 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: function = > gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line > = 334 > Sep 19 02:02:06 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: about to withdraw this file system > Sep 19 02:02:06 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: telling LM to withdraw > Sep 19 02:02:07 192.168.100.104 kernel: GFS2: > fsid=xencluster1:xenclusterfs1.1: withdrawn > Sep 19 02:02:07 192.168.100.104 kernel: > Sep 19 02:02:07 192.168.100.104 kernel: Call Trace: > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff885154ce>] > :gfs2:gfs2_lm_withdraw+0xc1/0xd0 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80262907>] > __wait_on_bit+0x60/0x6e > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80215788>] > sync_buffer+0x0/0x3f > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80262981>] > out_of_line_wait_on_bit+0x6c/0x78 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8029a01a>] > wake_bit_function+0x0/0x23 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8021a7f1>] > submit_bh+0x10a/0x111 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff885284a7>] > :gfs2:gfs2_meta_check_ii+0x2c/0x38 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff88518d30>] > :gfs2:gfs2_meta_indirect_buffer+0x104/0x160 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff88509fc3>] > :gfs2:gfs2_block_map+0x1dc/0x33e > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8021a821>] > poll_freewait+0x29/0x6a > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8850a199>] > :gfs2:gfs2_extent_map+0x74/0xac > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8850a2ce>] > :gfs2:gfs2_write_alloc_required+0xfd/0x122 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff885128d5>] > :gfs2:gfs2_glock_nq+0x248/0x273 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8851a27c>] > :gfs2:gfs2_write_begin+0x99/0x36a > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8851bd1b>] > :gfs2:gfs2_file_buffered_write+0x14b/0x2e5 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8020d3a5>] > file_read_actor+0x0/0xfc > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8851c151>] > :gfs2:__gfs2_file_aio_write_nolock+0x29c/0x2d4 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8851c2f4>] > :gfs2:gfs2_file_write_nolock+0xaa/0x10f > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8022eca0>] > __wake_up+0x38/0x4f > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80299fec>] > autoremove_wake_function+0x0/0x2e > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8022fbe4>] > pipe_readv+0x38e/0x3a2 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80263bce>] > lock_kernel+0x1b/0x32 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8851c444>] > :gfs2:gfs2_file_write+0x49/0xa7 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff80216da9>] > vfs_write+0xce/0x174 > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff802175e1>] > sys_write+0x45/0x6e > Sep 19 02:02:07 192.168.100.104 kernel: [<ffffffff8025f2f9>] > tracesys+0xab/0xb6 > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster