On 03/18/2010 10:04 AM, Steven Whitehouse wrote:
Hi,
On Thu, 2010-03-18 at 09:18 -0400, Douglas O'Neal wrote:
On 03/15/2010 09:55 AM, Douglas O'Neal wrote:
I have a problem with a gfs2 filesystem that is (was) being mounted
from a single host. The system appeared to have hung over the weekend
so I unmounted and remounted the disk. After a couple of minutes I
received this in the kernel logs:
Mar 15 08:28:50 localhost kernel: GFS2: fsid=: Trying to join cluster
"lock_nolock", "sde1"
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: Now mounting FS...
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0, already
locked for use
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Looking at
journal...
Mar 15 08:28:50 localhost kernel: GFS2: fsid=sde1.0: jid=0: Done
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: fatal: invalid
metadata block
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: bh = 4294972166
(type: exp=3, found=2)
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: function =
gfs2_rgrp_bh_get, file = fs/gfs2/rgrp.c, line = 759
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: about to withdraw
this file system
Mar 15 08:43:37 localhost kernel: GFS2: fsid=sde1.0: withdrawn
Mar 15 08:43:37 localhost kernel: Pid: 3687, comm: cp Not tainted
2.6.32-gentoo-r7 #2
Mar 15 08:43:37 localhost kernel: Call Trace:
Mar 15 08:43:37 localhost kernel: [<ffffffffa03b285d>] ?
gfs2_lm_withdraw+0x12d/0x160 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffff813bf22b>] ?
io_schedule+0x4b/0x70
Mar 15 08:43:37 localhost kernel: [<ffffffff810cc560>] ?
sync_buffer+0x0/0x50
Mar 15 08:43:37 localhost kernel: [<ffffffff813bf7a9>] ?
out_of_line_wait_on_bit+0x79/0xa0
Mar 15 08:43:37 localhost kernel: [<ffffffff8104e740>] ?
wake_bit_function+0x0/0x30
Mar 15 08:43:37 localhost kernel: [<ffffffff810cb162>] ?
submit_bh+0x112/0x140
Mar 15 08:43:37 localhost kernel: [<ffffffffa03b2947>] ?
gfs2_metatype_check_ii+0x47/0x60 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa03ae40b>] ?
gfs2_rgrp_bh_get+0x1db/0x300 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa0397d86>] ?
do_promote+0x116/0x200 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa03992a5>] ?
finish_xmote+0x1a5/0x3a0 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa0398fcd>] ?
do_xmote+0xfd/0x230 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ?
gfs2_glock_nq+0x13d/0x320 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa03aea2d>] ?
gfs2_inplace_reserve_i+0x1ed/0x7b0 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa0399581>] ?
run_queue+0xe1/0x210 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa039986d>] ?
gfs2_glock_nq+0x13d/0x320 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffffa03a1f92>] ?
gfs2_write_begin+0x272/0x480 [gfs2]
Mar 15 08:43:37 localhost kernel: [<ffffffff8106df04>] ?
generic_file_buffered_write+0x114/0x290
Mar 15 08:43:37 localhost kernel: [<ffffffff8106e4a8>] ?
__generic_file_aio_write+0x278/0x450
Mar 15 08:43:37 localhost kernel: [<ffffffff8106e6d5>] ?
generic_file_aio_write+0x55/0xb0
Mar 15 08:43:37 localhost kernel: [<ffffffff810a6a1b>] ?
do_sync_write+0xdb/0x120
Mar 15 08:43:37 localhost kernel: [<ffffffff8104e710>] ?
autoremove_wake_function+0x0/0x30
Mar 15 08:43:37 localhost kernel: [<ffffffff8108511f>] ?
handle_mm_fault+0x1bf/0x850
Mar 15 08:43:37 localhost kernel: [<ffffffff8108b5cc>] ?
mmap_region+0x23c/0x5d0
Mar 15 08:43:37 localhost kernel: [<ffffffff810a752b>] ?
vfs_write+0xcb/0x160
Mar 15 08:43:37 localhost kernel: [<ffffffff810a76c3>] ?
sys_write+0x53/0xa0
Mar 15 08:43:37 localhost kernel: [<ffffffff8100b2ab>] ?
system_call_fastpath+0x16/0x1b
I again unmounted the disk but now when I try to fsck the filesystem I
get:
urania# fsck.gfs2 -v /dev/sde1
Initializing fsck
Initializing lists...
Either the super block is corrupted, or this is not a GFS2 filesystem
The server is a running kernel 2.6.32, 64-bit. The array is a
Jetstore 516iS with a single 28TB iSCSI volume defined. The relevant
line from the fstab is
/dev/sde1 /illumina gfs2 _netdev,rw,lockproto=lock_nolock
gfs2_tool isn't much help, nor is gfs2_edit:
urania# gfs2_tool sb /dev/sde1 all
/usr/src/cluster-3.0.7/gfs2/tool/../libgfs2/libgfs2.h: there isn't a
GFS2 filesystem on /dev/sde1
urania# gfs2_edit -p sb /dev/sde1
bad seek: Invalid argument from gfs2_load_inode:416: block
3747350044811107074 (0x34014302ee029b02)
Is there an alternate superblock that I can use to mount the disk to
at least get the last couple of days of data off of it?
Anybody?
What version of the userland tools are you using? There has been an
update recently to fsck designed to solve a number of problems. I've
never seen a filesystem which is so badly corrupted that the super block
is unrecognisable before. The super block is not ever altered during
normal fs usage.
Are you 100% certain that this volume was not being accessed by another
node on the network?
If you can save off the metadata then we can take a look at it. That
might not be possible with a corrupt superblock though, so an
alternative is to make it available somehow for us to look at,
Steve.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
Userland tools 3.0.7. The iSCSI array is on a closed network and is
protected by a CHAP login. No other system has been configured to access
the array. I have the first 1MB of the disk available at
http://urania.dbi.udel.edu/sde.block.bz2 if you want to see the actual
data. gfs2_edit will not pull the metadata off:
urania ~ # gfs2_edit savemeta /dev/sde /tmp/metasave
Segmentation fault
Doug
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster