Hi, On Thu, 2010-03-18 at 09:18 -0400, Douglas O'Neal wrote: > On 03/15/2010 09:55 AM, Douglas O'Neal wrote: > > I have a problem with a gfs2 filesystem that is (was) being mounted > > from a single host. The system appeared to have hung over the weekend > > so I unmounted and remounted the disk. After a couple of minutes I > > received this in the kernel logs: > > > > Mar 15 08:28:50 localhost kernel: GFS2: fsid=: Trying to join cluster > > "lock_nolock", "sde1" > > Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: Now mounting FS... > > Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0, already > > locked for use > > Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Looking at > > journal... > > Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Done > > Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: fatal: invalid > > metadata block > > Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: bh = 4294972166 > > (type: exp=3, found=2) > > Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: function = > > gfs2_rgrp_bh_get, file = fs/gfs2/rgrp.c, line = 759 > > Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: about to withdraw > > this file system > > Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: withdrawn > > Mar 15 08:43:37 localhost kernel: Pid: 3687, comm: cp Not tainted > > 2.6.32-gentoo-r7 #2 > > Mar 15 08:43:37 localhost kernel: Call Trace: > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03b285d>] ? > > gfs2_lm_withdraw+0x12d/0x160 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffff813bf22b>] ? > > io_schedule+0x4b/0x70 > > Mar 15 08:43:37 localhost kernel: [<ffffffff810cc560>] ? > > sync_buffer+0x0/0x50 > > Mar 15 08:43:37 localhost kernel: [<ffffffff813bf7a9>] ? > > out_of_line_wait_on_bit+0x79/0xa0 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8104e740>] ? > > wake_bit_function+0x0/0x30 > > Mar 15 08:43:37 localhost kernel: [<ffffffff810cb162>] ? > > submit_bh+0x112/0x140 > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03b2947>] ? > > gfs2_metatype_check_ii+0x47/0x60 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03ae40b>] ? > > gfs2_rgrp_bh_get+0x1db/0x300 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa0397d86>] ? > > do_promote+0x116/0x200 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03992a5>] ? > > finish_xmote+0x1a5/0x3a0 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa0398fcd>] ? > > do_xmote+0xfd/0x230 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ? > > gfs2_glock_nq+0x13d/0x320 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03aea2d>] ? > > gfs2_inplace_reserve_i+0x1ed/0x7b0 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa0399581>] ? > > run_queue+0xe1/0x210 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ? > > gfs2_glock_nq+0x13d/0x320 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffffa03a1f92>] ? > > gfs2_write_begin+0x272/0x480 [gfs2] > > Mar 15 08:43:37 localhost kernel: [<ffffffff8106df04>] ? > > generic_file_buffered_write+0x114/0x290 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8106e4a8>] ? > > __generic_file_aio_write+0x278/0x450 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8106e6d5>] ? > > generic_file_aio_write+0x55/0xb0 > > Mar 15 08:43:37 localhost kernel: [<ffffffff810a6a1b>] ? > > do_sync_write+0xdb/0x120 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8104e710>] ? > > autoremove_wake_function+0x0/0x30 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8108511f>] ? > > handle_mm_fault+0x1bf/0x850 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8108b5cc>] ? > > mmap_region+0x23c/0x5d0 > > Mar 15 08:43:37 localhost kernel: [<ffffffff810a752b>] ? > > vfs_write+0xcb/0x160 > > Mar 15 08:43:37 localhost kernel: [<ffffffff810a76c3>] ? > > sys_write+0x53/0xa0 > > Mar 15 08:43:37 localhost kernel: [<ffffffff8100b2ab>] ? > > system_call_fastpath+0x16/0x1b > > > > I again unmounted the disk but now when I try to fsck the filesystem I > > get: > > urania# fsck.gfs2 -v /dev/sde1 > > Initializing fsck > > Initializing lists... > > Either the super block is corrupted, or this is not a GFS2 filesystem > > > > The server is a running kernel 2.6.32, 64-bit. The array is a > > Jetstore 516iS with a single 28TB iSCSI volume defined. The relevant > > line from the fstab is > > /dev/sde1 /illumina gfs2 _netdev,rw,lockproto=lock_nolock > > > > gfs2_tool isn't much help, nor is gfs2_edit: > > urania# gfs2_tool sb /dev/sde1 all > > /usr/src/cluster-3.0.7/gfs2/tool/../libgfs2/libgfs2.h: there isn't a > > GFS2 filesystem on /dev/sde1 > > urania# gfs2_edit -p sb /dev/sde1 > > bad seek: Invalid argument from gfs2_load_inode:416: block > > 3747350044811107074 (0x34014302ee029b02) > > > > Is there an alternate superblock that I can use to mount the disk to > > at least get the last couple of days of data off of it? > > > Anybody? > What version of the userland tools are you using? There has been an update recently to fsck designed to solve a number of problems. I've never seen a filesystem which is so badly corrupted that the super block is unrecognisable before. The super block is not ever altered during normal fs usage. Are you 100% certain that this volume was not being accessed by another node on the network? If you can save off the metadata then we can take a look at it. That might not be possible with a corrupt superblock though, so an alternative is to make it available somehow for us to look at, Steve. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster