On Tue, Nov 22, 2011 at 10:52 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: > Ok, error 5 is EIO: > > 8 include/asm-generic/errno-base.h 8 #define EIO 5 > > So the very first error you saw was "xfs_do_force_shutdown(0x1) called from line..." ? > Or the "xfs_log_force error 5 returned?" I'm wondering if there was more > before this. > > It's worth looking carefully to see the very first problem reported by xfs, > and posibly from storage before that. (i.e. did your storage go wonky?) Oh, we have a few more logs than I'd thought to look for. The xfs related messages from bootup after the kernel upgrade: Nov 17 16:01:01 cephstore6358 kernel: [ 1.924668] SGI XFS with security attributes, large block/inode numbers, no debug enabled ... Nov 17 16:01:01 cephstore6358 kernel: [ 190.047204] XFS (sdc1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 190.198126] XFS (sdc1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 190.281929] XFS (sdc1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 190.296303] XFS (sde1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 190.430809] XFS (sde1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 197.486417] XFS (sde1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 197.492596] XFS (sdg1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 197.652085] XFS (sdg1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 197.724493] XFS (sdg1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 197.730526] XFS (sdi1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 197.871074] XFS (sdi1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 206.570177] XFS (sdi1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 206.576329] XFS (sdk1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 206.738760] XFS (sdk1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 206.823346] XFS (sdk1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 206.837938] XFS (sdm1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 206.962455] XFS (sdm1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 207.062120] XFS (sdm1): Ending recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 207.078134] XFS (sdo1): Mounting Filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 207.240052] XFS (sdo1): Starting recovery (logdev: internal) Nov 17 16:01:01 cephstore6358 kernel: [ 207.321602] XFS (sdo1): Ending recovery (logdev: internal) ... Nov 17 16:01:01 cephstore6358 kernel: [ 214.214688] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1664 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff811d6b71 Nov 17 16:01:01 cephstore6358 kernel: [ 214.214692] Nov 17 16:01:01 cephstore6358 kernel: [ 214.227313] Pid: 11196, comm: ceph-osd Not tainted 3.1.0-dho-00004-g1ffcb5c-dirty #1 Nov 17 16:01:01 cephstore6358 kernel: [ 214.235056] Call Trace: Nov 17 16:01:01 cephstore6358 kernel: [ 214.237530] [<ffffffff811d606e>] ? xfs_free_ag_extent+0x4e3/0x698 Nov 17 16:01:01 cephstore6358 kernel: [ 214.243717] [<ffffffff811d6b71>] ? xfs_free_extent+0xb6/0xf9 Nov 17 16:01:01 cephstore6358 kernel: [ 214.249468] [<ffffffff811d3034>] ? kmem_zone_alloc+0x58/0x9e Nov 17 16:01:01 cephstore6358 kernel: [ 214.255220] [<ffffffff812095f9>] ? xfs_trans_get_efd+0x21/0x2a Nov 17 16:01:01 cephstore6358 kernel: [ 214.261159] [<ffffffff811e2011>] ? xfs_bmap_finish+0xeb/0x160 Nov 17 16:01:01 cephstore6358 kernel: [ 214.266993] [<ffffffff811f8634>] ? xfs_itruncate_extents+0xe8/0x1d0 Nov 17 16:01:01 cephstore6358 kernel: [ 214.273361] [<ffffffff811f879f>] ? xfs_itruncate_data+0x83/0xee Nov 17 16:01:01 cephstore6358 kernel: [ 214.279362] [<ffffffff811cb0a2>] ? xfs_setattr_size+0x246/0x36c Nov 17 16:01:01 cephstore6358 kernel: [ 214.285363] [<ffffffff811cb1e3>] ? xfs_vn_setattr+0x1b/0x2f Nov 17 16:01:01 cephstore6358 kernel: [ 214.291031] [<ffffffff810e7875>] ? notify_change+0x16d/0x23e Nov 17 16:01:01 cephstore6358 kernel: [ 214.296776] [<ffffffff810d2982>] ? do_truncate+0x68/0x86 Nov 17 16:01:01 cephstore6358 kernel: [ 214.302172] [<ffffffff810d2b11>] ? sys_truncate+0x171/0x173 Nov 17 16:01:01 cephstore6358 kernel: [ 214.307846] [<ffffffff8166c07b>] ? system_call_fastpath+0x16/0x1b Nov 17 16:01:01 cephstore6358 kernel: [ 214.314031] XFS (sdg1): xfs_do_force_shutdown(0x8) called from line 3864 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff811e2046 Nov 17 16:01:01 cephstore6358 kernel: [ 214.340451] XFS (sdg1): Corruption of in-memory data detected. Shutting down filesystem Nov 17 16:01:01 cephstore6358 kernel: [ 214.348518] XFS (sdg1): Please umount the filesystem and rectify the problem(s) Nov 17 16:01:01 cephstore6358 kernel: [ 227.789285] XFS (sdg1): xfs_log_force: error 5 returned. Nov 17 16:01:01 cephstore6358 kernel: [ 229.820255] XFS (sdg1): xfs_log_force: error 5 returned. Nov 17 16:01:01 cephstore6358 kernel: [ 229.825550] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line 1037 of file fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8 Nov 17 16:01:01 cephstore6358 kernel: [ 229.845089] XFS (sdg1): xfs_log_force: error 5 returned. Nov 17 16:01:01 cephstore6358 kernel: [ 229.850388] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line 1037 of file fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8 (etc) I don't know the xfs code at all, but that looks like a bug to me — either the system got itself into a broken state from valid on-disk structures, or else the (best I can tell properly-ordered, barriered, etc) journal didn't properly protect against brokenness elsewhere. Also note that the initial post-reboot mount succeeded (it didn't break until after doing a series of truncates), and the subsequent ones are failing. -Greg _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs