Re: RAID 6 recovery (it's not looking good) More problems

Iain Rauch <groups@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 15 Jan 2009 11:41:14 +0000

I finally got the array to a state where it has 24/24 drives up.
Unfortunately after a copying some data onto it, it now comes up with IO
errors.

Please help,

Iain.

Here's what I've done so far:

root@skinner:/# umount /mnt/md0raid
umount: /mnt/md0raid: device is busy
umount: /mnt/md0raid: device is busy
root@skinner:/# fuser -m /mnt/md0raid
Cannot stat /mnt/md0raid: Input/output error
Cannot stat /mnt/md0raid: Input/output error
Cannot stat /mnt/md0raid: Input/output error
Cannot stat file /proc/9651/fd/4: Input/output error
root@skinner:/# fuser -m /dev/md0
Cannot stat file /proc/9651/fd/4: Input/output error
root@skinner:/# umount -l /mnt/md0raid
root@skinner:/# xfs_check /dev/md0
xfs_check: /dev/md0 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
root@skinner:/# dmesg | grep -i xfs
[196225.294919] XFS mounting filesystem md0
[196226.008338] Ending clean XFS mount for filesystem: md0
[204347.455334] XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of
file fs/xfs/xfs_alloc.c.  Caller 0xf8c21e90
[204347.455374]  [<f8c215eb>] xfs_free_ag_extent+0x53b/0x730 [xfs]
[204347.455400]  [<f8c21e90>] xfs_free_extent+0xe0/0x110 [xfs]
[204347.455441]  [<f8c21e90>] xfs_free_extent+0xe0/0x110 [xfs]
[204347.455503]  [<f8c2d360>] xfs_bmap_finish+0x140/0x190 [xfs]
[204347.455535]  [<f8c37900>] xfs_bunmapi+0x0/0xfb0 [xfs]
[204347.455555]  [<f8c55fcf>] xfs_itruncate_finish+0x24f/0x3b0 [xfs]
[204347.455618]  [<f8c77289>] xfs_inactive+0x469/0x500 [xfs]
[204347.455660]  [<f8c825e2>] xfs_fs_clear_inode+0x32/0x70 [xfs]
[204347.455779] xfs_force_shutdown(md0,0x8) called from line 4261 of file
fs/xfs/xfs_bmap.c.  Return address = 0xf8c82fec

root@skinner:/# xfs_repair -n /dev/md0
xfs_repair: /dev/md0 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
root@skinner:/# xfs_repair -fn /dev/md0
        - creating 2 worker thread(s)
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - 01:52:55: scanning filesystem freespace - 118 of 118 allocation
groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - 01:52:55: scanning agi unlinked lists - 118 of 118 allocation
groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
data fork in ino 1075468390 claims free block 838867328
<snip>
data fork in ino 1075468390 claims free block 838867811
        - agno = 4
bad nblocks 1863041 for inode 2147484215, would reset to 1898317
        - agno = 5
        - agno = 6
        - agno = 7
data fork in ino 3221610517 claims free block 3623910585
imap claims in-use inode 3221610517 is free, would correct imap
        - agno = 8
<snip>
        - agno = 117
data fork in ino 3758128252 claims free block 2952790138
data fork in ino 3758128252 claims free block 2952790139
imap claims in-use inode 3758128252 is free, would correct imap
        - 02:02:39: process known inodes and inode discovery - 55360 of
55360 inodes done
        - process newly discovered inodes...
        - 02:02:39: process newly discovered inodes - 118 of 118 allocation
groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 02:06:26: setting up duplicate extent list - 24 of 118 allocation
groups done
    - 02:06:26: Phase 4: elapsed time 3 minutes, 47 seconds - processed 6
allocation groups per minute
    - 02:06:26: Phase 4: 20% done - estimated remaining time 14 minutes, 49
seconds
        - 02:21:26: setting up duplicate extent list - 75 of 118 allocation
groups done
    - 02:21:26: Phase 4: elapsed time 18 minutes, 47 seconds - processed 3
allocation groups per minute
    - 02:21:26: Phase 4: 63% done - estimated remaining time 10 minutes, 46
seconds
        - 02:36:26: setting up duplicate extent list - 104 of 118 allocation
groups done
    - 02:36:26: Phase 4: elapsed time 33 minutes, 47 seconds - processed 3
allocation groups per minute
    - 02:36:26: Phase 4: 88% done - estimated remaining time 4 minutes, 32
seconds
        - 02:42:59: setting up duplicate extent list - 118 of 118 allocation
groups done
        - 02:51:26: setting up duplicate extent list - 118 of 118 allocation
groups done
    - 02:51:26: Phase 4: elapsed time 48 minutes, 47 seconds - processed 2
allocation groups per minute
    - 02:51:26: Phase 4: 100% done - estimated remaining time
        - 03:06:26: setting up duplicate extent list - 118 of 118 allocation
groups done
    - 03:06:26: Phase 4: elapsed time 1 hour, 3 minutes, 47 seconds -
processed 1 allocation groups per minute
    - 03:06:26: Phase 4: 100% done - estimated remaining time
        - 03:21:27: setting up duplicate extent list - 118 of 118 allocation
groups done
    - 03:21:27: Phase 4: elapsed time 1 hour, 18 minutes, 48 seconds -
processed 1 allocation groups per minute
    - 03:21:27: Phase 4: 100% done - estimated remaining time
        - 03:36:26: setting up duplicate extent list - 118 of 118 allocation
groups done
    - 03:36:26: Phase 4: elapsed time 1 hour, 33 minutes, 47 seconds -
processed 1 allocation groups per minute
    - 03:36:26: Phase 4: 100% done - estimated remaining time
        - 03:51:26: setting up duplicate extent list - 118 of 118 allocation
groups done
    - 03:51:26: Phase 4: elapsed time 1 hour, 48 minutes, 47 seconds -
processed 1 allocation groups per minute
    - 03:51:26: Phase 4: 100% done - estimated remaining time
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - 04:06:26: check for inodes claiming duplicate blocks - 12480 of
55360 inodes done
    - 04:06:26: Phase 4: elapsed time 2 hours, 3 minutes, 47 seconds -
processed 100 inodes per minute
    - 04:06:26: Phase 4: 22% done - estimated remaining time 7 hours, 5
minutes, 18 seconds
        - agno = 3
entry ":2eDS_Store" at block 0 offset 72 in directory inode 1073752707
references free inode 1073752709
    would clear inode number in entry at offset 72...
entry ":2eDS_Store" at block 0 offset 72 in directory inode 1073752708
references free inode 1073753109
    would clear inode number in entry at offset 72...
entry ":2epar_done" at block 0 offset 72 in directory inode 1073753094
references free inode 1073753095
    would clear inode number in entry at offset 72...
        - agno = 4
bad nblocks 1863041 for inode 2147484215, would reset to 1898317
        - 04:21:26: check for inodes claiming duplicate blocks - 23744 of
55360 inodes done
    - 04:21:26: Phase 4: elapsed time 2 hours, 18 minutes, 47 seconds -
processed 171 inodes per minute
    - 04:21:26: Phase 4: 42% done - estimated remaining time 3 hours, 4
minutes, 47 seconds
        - agno = 5
        - agno = 6
entry ":2eDS_Store" at block 0 offset 72 in directory inode 3221234251
references free inode 3221234252
    would clear inode number in entry at offset 72...
entry ":2epar_done" at block 0 offset 96 in directory inode 3221234251
references free inode 3221234253
    would clear inode number in entry at offset 96...
        - 04:36:26: check for inodes claiming duplicate blocks - 39360 of
55360 inodes done
    - 04:36:26: Phase 4: elapsed time 2 hours, 33 minutes, 47 seconds -
processed 255 inodes per minute
    - 04:36:26: Phase 4: 71% done - estimated remaining time 1 hour, 2
minutes, 30 seconds
        - agno = 7
        - agno = 117
        - 04:51:26: check for inodes claiming duplicate blocks - 49664 of
55360 inodes done
    - 04:51:26: Phase 4: elapsed time 2 hours, 48 minutes, 47 seconds -
processed 294 inodes per minute
    - 04:51:26: Phase 4: 89% done - estimated remaining time 19 minutes, 21
seconds
        - 04:59:54: check for inodes claiming duplicate blocks - 55360 of
55360 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem starting at / ...
entry ":2eDS_Store" in directory inode 1073752707 points to free inode
1073752709, would junk entry
entry ":2eDS_Store" in directory inode 1073752708 points to free inode
1073753109, would junk entry
entry ":2epar_done" in directory inode 1073753094 points to free inode
1073753095, would junk entry
entry ":2eDS_Store" in directory inode 3221234251 points to free inode
3221234252, would junk entry
entry ":2epar_done" in directory inode 3221234251 points to free inode
3221234253, would junk entry
        - 05:02:01: traversing filesystem - 118 of 118 allocation groups
done
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 3758128252, would move to lost+found
Phase 7 - verify link counts...
        - 05:02:05: verify link counts - 55360 of 55360 inodes done
No modify flag set, skipping filesystem flush and exiting.
root@skinner:/# 

Syslog has got lots of these:

Jan 15 00:01:45 skinner kernel: [203766.124587] SCSI device sdd: 976773168
512-byte hdwr sectors (500108 MB)
Jan 15 00:01:45 skinner kernel: [203766.132971] sdd: Write Protect is off
Jan 15 00:01:45 skinner kernel: [203766.132976] sdd: Mode Sense: 00 3a 00 00
Jan 15 00:01:45 skinner kernel: [203766.134301] SCSI device sdd: write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 15 00:01:58 skinner kernel: [203780.291734] ata17.00: exception Emask
0x10 SAct 0x0 SErr 0x90000 action 0x2 frozen
Jan 15 00:01:58 skinner kernel: [203780.291763] ata17.00: cmd
c8/00:20:5f:24:7b/00:00:00:00:00/e2 tag 0 cdb 0x0 data 16384 in
Jan 15 00:01:58 skinner kernel: [203780.291765]          res
ff/ff:ff:ff:ff:ff/d0:d0:d0:d0:d0/ff Emask 0x12 (ATA bus error)
Jan 15 00:01:58 skinner kernel: [203780.292896] ata17: hard resetting port
Jan 15 00:01:59 skinner kernel: [203781.551363] ata17: COMRESET failed
(device not ready)
Jan 15 00:01:59 skinner kernel: [203781.551404] ata17: hardreset failed,
retrying in 5 secs
Jan 15 00:02:04 skinner kernel: [203786.548252] ata17: hard resetting port
Jan 15 00:02:05 skinner kernel: [203787.427001] ata17: SATA link up 1.5 Gbps
(SStatus 113 SControl 310)
Jan 15 00:02:06 skinner kernel: [203787.443227] ata17.00: configured for
UDMA/33
Jan 15 00:02:06 skinner kernel: [203787.443238] ata17: EH complete

Syslog around when it happened:

Jan 15 00:11:15 skinner kernel: [204335.803986] SCSI device sde: write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 15 00:11:15 skinner kernel: [204336.338284] raid5:md0: read error
corrected (8 sectors at 58272000 on sde1)
Jan 15 00:11:15 skinner kernel: [204336.338290] raid5:md0: read error
corrected (8 sectors at 58272008 on sde1)
Jan 15 00:11:15 skinner afpd[9470]: 3214789.36KB read, 6901.94KB written
Jan 15 00:11:15 skinner afpd[9470]: dsi_stream_write: Broken pipe
Jan 15 00:11:15 skinner afpd[9470]: Connection terminated
Jan 15 00:11:15 skinner afpd[9651]: Warning: No CNID scheme for volume
/mnt/md0raid/. Using default.
Jan 15 00:11:15 skinner afpd[9651]: Setting uid/gid to 0/0
Jan 15 00:11:15 skinner afpd[9651]: CNID DB initialized using Sleepycat
Software: Berkeley DB 4.2.52: (December  3, 2003)
Jan 15 00:11:17 skinner afpd[5506]: server_child[1] 9470 exited 1
Jan 15 00:11:17 skinner afpd[9651]: ipc_write: command: 2, pid: 9651,
msglen: 24
Jan 15 00:11:17 skinner afpd[5506]: ipc_read: command: 2, pid: 9651, len: 24
Jan 15 00:11:17 skinner afpd[5506]: Setting clientid (len 16) for 9651,
boottime 496E72CC
Jan 15 00:11:17 skinner afpd[5506]: ipc_get_session: len: 24, idlen 16, time
496e72cc
Jan 15 00:11:17 skinner afpd[9651]: ipc_write: command: 2, pid: 9651,
msglen: 24
Jan 15 00:11:17 skinner afpd[5506]: ipc_read: command: 2, pid: 9651, len: 24
Jan 15 00:11:17 skinner afpd[5506]: Setting clientid (len 16) for 9651,
boottime 496E72CC
Jan 15 00:11:17 skinner afpd[5506]: ipc_get_session: len: 24, idlen 16, time
496e72cc
Jan 15 00:11:17 skinner afpd[9653]: ASIP session:548(5) from
192.168.0.2:49345(8)
Jan 15 00:11:17 skinner afpd[5506]: server_child[1] 9653 done
Jan 15 00:11:26 skinner kernel: [204347.455334] XFS internal error
XFS_WANT_CORRUPTED_GOTO at line 1563 of file fs/xfs/xfs_alloc.c.  Caller
0xf8c21e90
Jan 15 00:11:26 skinner kernel: [204347.455374]  [pg0+947631595/1069122560]
xfs_free_ag_extent+0x53b/0x730 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455400]  [pg0+947633808/1069122560]
xfs_free_extent+0xe0/0x110 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455441]  [pg0+947633808/1069122560]
xfs_free_extent+0xe0/0x110 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455503]  [pg0+947680096/1069122560]
xfs_bmap_finish+0x140/0x190 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455535]  [pg0+947722496/1069122560]
xfs_bunmapi+0x0/0xfb0 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455555]  [pg0+947847119/1069122560]
xfs_itruncate_finish+0x24f/0x3b0 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455618]  [pg0+947982985/1069122560]
xfs_inactive+0x469/0x500 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455645]  [mutex_lock+8/32]
mutex_lock+0x8/0x20
Jan 15 00:11:26 skinner kernel: [204347.455660]  [pg0+948028898/1069122560]
xfs_fs_clear_inode+0x32/0x70 [xfs]
Jan 15 00:11:26 skinner kernel: [204347.455679]  [dentry_iput+132/144]
dentry_iput+0x84/0x90
Jan 15 00:11:26 skinner kernel: [204347.455688]  [clear_inode+159/336]
clear_inode+0x9f/0x150
Jan 15 00:11:26 skinner kernel: [204347.455691]
[truncate_inode_pages+23/32] truncate_inode_pages+0x17/0x20
Jan 15 00:11:26 skinner kernel: [204347.455698]
[generic_delete_inode+234/256] generic_delete_inode+0xea/0x100
Jan 15 00:11:26 skinner kernel: [204347.455704]  [iput+86/112]
iput+0x56/0x70
Jan 15 00:11:26 skinner kernel: [204347.455709]  [do_unlinkat+238/336]
do_unlinkat+0xee/0x150
Jan 15 00:11:26 skinner kernel: [204347.455747]  [syscall_call+7/11]
syscall_call+0x7/0xb
Jan 15 00:11:26 skinner kernel: [204347.455775]  =======================
Jan 15 00:11:26 skinner kernel: [204347.455779] xfs_force_shutdown(md0,0x8)
called from line 4261 of file fs/xfs/xfs_bmap.c.  Return address =
0xf8c82fec
Jan 15 00:11:26 skinner kernel: [204347.520962] Filesystem "md0": Corruption
of in-memory data detected.  Shutting down filesystem: md0
Jan 15 00:11:26 skinner kernel: [204347.520989] Please umount the
filesystem, and rectify the problem(s)

sdc seems to have had a few erros hourly or so before this happened
  (ATA Error Count: 104)
sdd doesn't have any SMART errors.
sde shows Spin_Up_Time: 8320, last error at disk power-on lifetime: 56 hours
In fact quite a few disks seem to have had non fatal errors recently.

> On Fri, Dec 19, 2008 at 12:29:30PM +0000, Iain Rauch wrote:
>>> I'm still tired (now even more ;-) ). Just check again if /dev/sdu really
>>> was the latest to fail and if so, clone this one.
>>> I also suggest to reassemble it without an immediate raid-rebuild.
>>> First check your data and only then add a new drives to the raid.
>>> Once you start a raid-rebuild.
>>> there is no way to go back. We recently also had the problem of three
>>> failed disks  but we only could get back the data by not assembling the
>>> array with the latest failed disk, but with the 2nd latest (don't ask why).
>>> 
>>> So in short
>>> 
>>> 1) clone disk
>>> 
>>> 2) mdadm --assemble --force /dev/mdX /dev/sda1 /dev/sdb1 ... /dev/sdx1
>>> 
>>> ===> Use only **22** devices here.
>>> 
>>> 3) Mount and check data, maybe even a read-only fsck
>>> 
>>> 4) Add two new disks.
>>> 
>>> 
>>> Hope it helps,
>>> Bernd
>> 
>> Well I cloned the disk and force started the array with 22 drives. I mounted
>> the file system read-only and it did appear to be intact :)
> 
> I'm glad to hear that.
> 
>> 
>> The problem is I cloned the failed drive to a 1.5TB Seagate, and it has the
>> freezing issue. After 12h of rebuilding (out of 50) that drive got kicked.
>> I'm gonna see if updating the FW on the drive helps, but otherwise I'll just
>> have to get another decent drive.
>> 
>> Is there any way to have mdadm be more patient and not kick the drive, or
>> let me put it back in and continue the rebuild of another drive? I don't
>> believe the drive will operate for 50h straight.
> 
> I think this would continue the rebuild if you would use bitmaps. You may
> add bitmaps by using "mdadm --grow --bitmap=internal /dev/mdX", but I'm not
> sure if it will work on a degrade md device. At least it won't work during
> rebuild phase.
> 
> Cheers,
> Bernd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID 6 recovery (it's not looking good) *More problems*

Re: RAID 6 recovery (it's not looking good) More problems