Re: large filesystem corruptions

Ric Wheeler <rwheeler@xxxxxxxxxx> · Sat, 13 Mar 2010 08:07:08 -0500

On 03/12/2010 08:58 PM, Michael Evans wrote:
On Fri, Mar 12, 2010 at 4:55 PM, Kapetanakis Giannis
<bilias@xxxxxxxxxxxxxxxxxx>  wrote:

On 13/03/10 02:29, Kapetanakis Giannis wrote:

I did a new test now and didn't use GFT partitions
but the whole physical/logical drives

sdb -
| --->  md0 --->  LVM --->  ext4 filesystems
sdc -

all sdb, sdc, md0 are gpt labeled without gpt partitions
inside. No crash so far but without any data written.

Maybe the gpt partitions did the bad thing?
Can md0 use large gpt drives with no partitions?
can lvm2 use large raid device with no partition pv?

crashed and burned also:

Mar 13 02:40:28 server kernel: EXT4-fs error (device dm-4):
ext4_mb_generate_buddy: EXT4-fs: group 48: 24544 blocks in bitmap, 2016 in
gd
Mar 13 02:40:28 server kernel: EXT4-fs error (device dm-4): mb_free_blocks:
double-free of inode 12's block 1583104(bit 10240 in group 48)
Mar 13 02:40:28 server kernel: EXT4-fs error (device dm-4): mb_free_blocks:
double-free of inode 12's block 1583105(bit 10241 in group 48)
--snip

so gpt partitions was not a problem.

Next in list: XFS

   682  2:47    mkfs.xfs -f /dev/vgshare/share
   684  2:47    mount /dev/vgshare/share /share/
   686  2:47    mkfs.xfs -f /dev/vgshare/test
   687  2:47    mount /dev/vgshare/test /test/
   689  2:47    cd /share/
   691  2:48    dd if=/dev/zero of=papaki bs=4096

Mar 13 02:47:23 server kernel: Filesystem "dm-4": Disabling barriers, not
supported by the underlying device
Mar 13 02:47:23 server kernel: XFS mounting filesystem dm-4
Mar 13 02:47:48 server kernel: Filesystem "dm-5": Disabling barriers, not
supported by the underlying device
Mar 13 02:47:48 server kernel: XFS mounting filesystem dm-5
Mar 13 02:48:05 server kernel: Filesystem "dm-4": XFS internal error
xfs_trans_cancel at line 1138 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_PAE/xfs_trans.c.
  Caller 0xf90e0bbc
Mar 13 02:48:05 server kernel:  [<f90d85fe>] xfs_trans_cancel+0x4d/0xd6
[xfs]
Mar 13 02:48:05 server kernel:  [<f90e0bbc>] xfs_create+0x4ec/0x525 [xfs]
Mar 13 02:48:05 server kernel:  [<f90e0bbc>] xfs_create+0x4ec/0x525 [xfs]
Mar 13 02:48:05 server kernel:  [<f90e88f4>] xfs_vn_mknod+0x19c/0x380 [xfs]
Mar 13 02:48:05 server kernel:  [<c04760e9>] __getblk+0x30/0x27a
Mar 13 02:48:05 server kernel:  [<f8852ac7>] do_get_write_access+0x441/0x46e
[jbd]
Mar 13 02:48:05 server kernel:  [<f8889502>]
__ext3_get_inode_loc+0x109/0x2d5 [ext3]
Mar 13 02:48:05 server kernel:  [<c045a7aa>]
get_page_from_freelist+0x96/0x370
Mar 13 02:48:05 server kernel:  [<f90b6827>] xfs_dir_lookup+0x91/0xff [xfs]
Mar 13 02:48:05 server kernel:  [<f90c3c51>] xfs_iunlock+0x51/0x6d [xfs]
Mar 13 02:48:05 server kernel:  [<c04824f0>] __link_path_walk+0xc62/0xd33
Mar 13 02:48:05 server kernel:  [<c0480b43>] vfs_create+0xc8/0x12f
Mar 13 02:48:05 server kernel:  [<c04834ef>] open_namei+0x16a/0x5fb
Mar 13 02:48:05 server kernel:  [<c0472a92>] __dentry_open+0xea/0x1ab
Mar 13 02:48:05 server kernel:  [<c0472be2>] do_filp_open+0x1c/0x31
Mar 13 02:48:05 server kernel:  [<c0472c35>] do_sys_open+0x3e/0xae
Mar 13 02:48:05 server kernel:  [<c0472cd2>] sys_open+0x16/0x18
Mar 13 02:48:05 server kernel:  [<c0404f17>] syscall_call+0x7/0xb
Mar 13 02:48:05 server kernel:  =======================
Mar 13 02:48:05 server kernel: xfs_force_shutdown(dm-4,0x8) called from line
1139 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_PAE/xfs_trans.c.
  Return address = 0xf90eb6c4
Mar 13 02:48:05 server kernel: Filesystem "dm-4": Corruption of in-memory
data detected.  Shutting down filesystem: dm-4
Mar 13 02:48:05 server kernel: Please umount the filesystem, and rectify the
problem(s)
Mar 13 02:48:45 server kernel: xfs_force_shutdown(dm-4,0x1) called from line
424 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_PAE/xfs_rw.c. Return
address = 0xf90eb6c4
Mar 13 02:48:45 server kernel: xfs_force_shutdown(dm-4,0x1) called from line
424 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_PAE/xfs_rw.c. Return
address = 0xf90eb6c4

xfs_check /dev/vgshare/share
XFS: Log inconsistent (didn't find previous header)
XFS: failed to find log head
ERROR: cannot find log head/tail, run xfs_repair

xfs_repair /dev/vgshare/share
Phase 1 - find and verify superblock...
bad primary superblock - filesystem mkfs-in-progress bit set !!!

attempting to find secondary superblock...
...................................

I stopped it, can't wait to search 7TB to find the secondary
superblock...probably won't find anything

/test works

So are we sure it's the fs?
Something else is fishy...

regards,

Giannis
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

This is a really basic thing, but do you have the x86 support for very
large block devices (I can't remember what the option is, since I've
been running 64 bits on any system that even remotely came close to
needing it anyway) enabled in the config as well?

Here's a hit from google, CONFIG_LBD http://cateee.net/lkddb/web-lkddb/LBD.html

Enable block devices of size 2TB and larger.

Since you're using a device>2TB in size, I will assume you are using
one of the three 'version 1' superblock types.  Either at the end 1.0,
beginning 1.1 or 4kb in from the beginning.

Please provide the full output of mdadm -Dvvs

You can use any block device as a member of an md array.  However if
you are going 'whole drive' then it would be a very good idea to erase
the existing partition table structure prior to putting a raid
superblock on the device.  This way there is no confusion about if the
device has partitions or is in fact a raid member.  Similarly when
transitioning back the other way ensuring that the old metadata for
the array is erased is also a good idea.

The kernel you're running seems to be ... exceptionally old and
heavily patched.  I have no way of knowing if the many, many, patches
that fixed numerous issues over the /years/ since it's release have
been included.  Please make sure you have the most recent release from
your vendor and ask them for support in parallel.

I would agree that it would be key to try this on a newer kernel & on a 
64 bit box. If you have an issue with a specific vendor release, you 
should open a ticket/bugzilla with that vendor so they can help you 
figure this out.

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html