Hi Richard, [restored CC list... please use reply-to-all on kernel.org lists] On 02/06/2012 09:40 PM, Richard Herd wrote: > Hi Phil, > > Thanks for the swift response :-) Also I'm in (what I'd like to say > but can't - sunny) Sydney... > > OK, without slathering this thread is smart reports I can quite > definitely say you are exactly nail-on-the-head with regard to the > read errors escalating into link timeouts. This is exactly what is > happening. I had thought this was actually a pretty common setup for > home users (eg mdadm and drives such as WD20EARS/ST2000s) - I have the > luxury of budgets for Netapp kit at work - unfortunately my personal > finances only stretch to an ITX case and a bunch of cheap HDs! I understand the constraints, as I pinch pennies at home and at the office (I own my engineering firm). I've made do with cheap desktop drives that do support ERC. I got burned when Seagate dropped ERC on their latest desktop drives. Hitachi Deskstar is the only affordable model on the market that still support ERC. > I understand it's the ERC causing disks to get kicked, and fully > understand if you can't help further. Not that I won't help, as there's no risk to me :-) > Assembling without sdg I'm not sure will do it, as what we have is 4 > disks with the same events counter (3 active sync (sda/sdb/sdf), 1 > spare rebuilding (sdd)), and 2 (sdg/sdc) removed with older event > counters. Leaving out sdg leaves us with sdc which has an event > counter of 1848333. As the 3 active sync (sda/sdb/sdf) + 1 spare > (sdd) have an event counter of 1848341, mdadm doesn't want to let me > use sdc in the array even with --force. This surprises me. The purpose of "--force" with assemble is to ignore the event count. Have you tried this with the newer mdadm you compiled? > As you say as it's in the middle of a reshape so a recreate is out. > > I'm considering data loss is a given at this point, but even being > able to bring the array online degraded and pull out whatever is still > intact would help. > > If you have any further suggestions that would be great, but I do > understand your position on ERC and thank you for your input :-) Please do retry the --assemble --force with /dev/sdg left out? I'll leave the balance of your response untrimmed for the list to see. Phil > Feb 7 01:07:16 raven kernel: [18891.989330] ata8: hard resetting link > Feb 7 01:07:22 raven kernel: [18897.356104] ata8: link is slow to > respond, please be patient (ready=0) > Feb 7 01:07:26 raven kernel: [18902.004280] ata8: hard resetting link > Feb 7 01:07:32 raven kernel: [18907.372104] ata8: link is slow to > respond, please be patient (ready=0) > Feb 7 01:07:36 raven kernel: [18912.020097] ata8: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Feb 7 01:07:41 raven kernel: [18917.020093] ata8.00: qc timeout (cmd 0xec) > Feb 7 01:07:41 raven kernel: [18917.028074] ata8.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 01:07:41 raven kernel: [18917.028310] ata8: hard resetting link > Feb 7 01:07:47 raven kernel: [18922.396089] ata8: link is slow to > respond, please be patient (ready=0) > Feb 7 01:07:51 raven kernel: [18927.044313] ata8: hard resetting link > Feb 7 01:07:56 raven kernel: [18932.020099] ata8: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Feb 7 01:08:06 raven kernel: [18942.020048] ata8.00: qc timeout (cmd 0xec) > Feb 7 01:08:06 raven kernel: [18942.028075] ata8.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 01:08:06 raven kernel: [18942.028307] ata8: limiting SATA link > speed to 3.0 Gbps > Feb 7 01:08:06 raven kernel: [18942.028321] ata8: hard resetting link > Feb 7 01:08:12 raven kernel: [18947.396108] ata8: link is slow to > respond, please be patient (ready=0) > Feb 7 01:08:16 raven kernel: [18951.988069] ata8: SATA link up 6.0 > Gbps (SStatus 133 SControl 320) > Feb 7 01:08:46 raven kernel: [18981.988104] ata8.00: qc timeout (cmd 0xec) > Feb 7 01:08:46 raven kernel: [18981.996070] ata8.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 01:08:46 raven kernel: [18981.996302] ata8.00: disabled > Feb 7 01:08:46 raven kernel: [18981.996324] ata8.00: device reported > invalid CHS sector 0 > Feb 7 01:08:46 raven kernel: [18981.996348] ata8: hard resetting link > Feb 7 01:08:52 raven kernel: [18987.364104] ata8: link is slow to > respond, please be patient (ready=0) > Feb 7 01:08:56 raven kernel: [18992.012050] ata8: SATA link up 6.0 > Gbps (SStatus 133 SControl 320) > Feb 7 01:08:56 raven kernel: [18992.012114] ata8: EH complete > Feb 7 01:08:56 raven kernel: [18992.012158] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.012165] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.012176] sd 8:0:0:0: [sdg] CDB: > Write(10): 2a 00 e8 e0 74 3f 00 00 08 00 > Feb 7 01:08:56 raven kernel: [18992.012696] md: super_written gets > error=-5, uptodate=0 > Feb 7 01:08:56 raven kernel: [18992.013169] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.013176] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.013186] sd 8:0:0:0: [sdg] CDB: > Read(10): 28 00 04 9d bd bf 00 00 80 00 > Feb 7 01:08:56 raven kernel: [18992.276986] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.276999] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.277012] sd 8:0:0:0: [sdg] CDB: > Read(10): 28 00 04 9d be 3f 00 00 80 00 > Feb 7 01:08:56 raven kernel: [18992.316919] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.316930] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.316942] sd 8:0:0:0: [sdg] CDB: > Read(10): 28 00 04 9d be bf 00 00 80 00 > Feb 7 01:08:56 raven kernel: [18992.326906] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.326920] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.326932] sd 8:0:0:0: [sdg] CDB: > Read(10): 28 00 04 9d bf 3f 00 00 80 00 > Feb 7 01:08:56 raven kernel: [18992.327944] sd 8:0:0:0: [sdg] > Unhandled error code > Feb 7 01:08:56 raven kernel: [18992.327956] sd 8:0:0:0: [sdg] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 01:08:56 raven kernel: [18992.327968] sd 8:0:0:0: [sdg] CDB: > Read(10): 28 00 04 9d bf bf 00 00 80 00 > Feb 7 01:08:57 raven kernel: [18992.555093] md: md0: reshape done. > Feb 7 01:08:57 raven kernel: [18992.607595] md: reshape of RAID array md0 > Feb 7 01:08:57 raven kernel: [18992.607606] md: minimum _guaranteed_ > speed: 200000 KB/sec/disk. > Feb 7 01:08:57 raven kernel: [18992.607614] md: using maximum > available idle IO bandwidth (but not more than 200000 KB/sec) for > reshape. > Feb 7 01:08:57 raven kernel: [18992.607628] md: using 128k window, > over a total of 1953511936 blocks. > Feb 7 06:41:02 raven rsyslogd: [origin software="rsyslogd" > swVersion="4.2.0" x-pid="911" x-info="http://www.rsyslog.com"] > rsyslogd was HUPed, type 'lightweight'. > Feb 7 07:12:32 raven kernel: [40807.989092] ata5: hard resetting link > Feb 7 07:12:38 raven kernel: [40813.524074] ata5: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Feb 7 07:12:43 raven kernel: [40818.524106] ata5.00: qc timeout (cmd 0xec) > Feb 7 07:12:43 raven kernel: [40818.524126] ata5.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 07:12:43 raven kernel: [40818.532788] ata5: hard resetting link > Feb 7 07:12:48 raven kernel: [40824.058039] ata5: SATA link up 6.0 > Gbps (SStatus 133 SControl 300) > Feb 7 07:12:58 raven kernel: [40834.056101] ata5.00: qc timeout (cmd 0xec) > Feb 7 07:12:58 raven kernel: [40834.056121] ata5.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 07:12:58 raven kernel: [40834.064203] ata5: limiting SATA link > speed to 3.0 Gbps > Feb 7 07:12:58 raven kernel: [40834.064217] ata5: hard resetting link > Feb 7 07:13:04 raven kernel: [40839.592095] ata5: SATA link up 3.0 > Gbps (SStatus 123 SControl 320) > Feb 7 07:13:34 raven kernel: [40869.592088] ata5.00: qc timeout (cmd 0xec) > Feb 7 07:13:34 raven kernel: [40869.592110] ata5.00: failed to > IDENTIFY (I/O error, err_mask=0x4) > Feb 7 07:13:34 raven kernel: [40869.599676] ata5.00: disabled > Feb 7 07:13:34 raven kernel: [40869.599700] ata5.00: device reported > invalid CHS sector 0 > Feb 7 07:13:34 raven kernel: [40869.599724] ata5: hard resetting link > Feb 7 07:13:39 raven kernel: [40875.124128] ata5: SATA link up 3.0 > Gbps (SStatus 123 SControl 320) > Feb 7 07:13:39 raven kernel: [40875.124201] ata5: EH complete > Feb 7 07:13:39 raven kernel: [40875.124243] sd 4:0:0:0: [sdd] > Unhandled error code > Feb 7 07:13:39 raven kernel: [40875.124251] sd 4:0:0:0: [sdd] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 07:13:39 raven kernel: [40875.124262] sd 4:0:0:0: [sdd] CDB: > Write(10): 2a 00 e8 e0 74 3f 00 00 08 00 > Feb 7 07:13:39 raven kernel: [40875.135544] md: super_written gets > error=-5, uptodate=0 > Feb 7 07:13:39 raven kernel: [40875.152171] sd 4:0:0:0: [sdd] > Unhandled error code > Feb 7 07:13:39 raven kernel: [40875.152179] sd 4:0:0:0: [sdd] Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > Feb 7 07:13:39 raven kernel: [40875.152189] sd 4:0:0:0: [sdd] CDB: > Read(10): 28 00 09 2b f2 3f 00 00 80 00 > Feb 7 07:13:41 raven kernel: [40876.734504] md: md0: reshape done. > Feb 7 07:13:41 raven kernel: [40876.736298] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.743529] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.750009] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.755143] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.760126] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.765070] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.769890] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.774759] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.779456] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.784166] lost page write due to > I/O error on md0 > Feb 7 07:13:41 raven kernel: [40876.788773] JBD: Detected IO errors > while flushing file data on md0 > Feb 7 07:13:41 raven kernel: [40876.796386] JBD: Detected IO errors > while flushing file data on md0 > > On Tue, Feb 7, 2012 at 1:15 PM, Phil Turmel <philip@xxxxxxxxxx> wrote: >> Hi Richard, >> >> On 02/06/2012 08:34 PM, Richard Herd wrote: >>> Hey guys, >>> >>> I'm in a bit of a pickle here and if any mdadm kings could step in and >>> throw some advice my way I'd be very grateful :-) >>> >>> Quick bit of background - little NAS based on an AMD E350 running >>> Ubuntu 10.04. Running a software RAID 5 from 5x2TB disks. Every few >>> months one of the drives would fail a request and get kicked from the >>> array (as is becoming common for these larger multi TB drives they >>> tolerate the occasional bad sector by reallocating from a pool of >>> spares (but that's a whole other story)). This happened across a >>> variety of brands and two different controllers. I'd simply add the >>> disk that got popped back in and let it re-sync. SMART tests always >>> in good health. >> >> Some more detail on the actual devices would help, especially the >> output of lsdrv [1] to document what device serial numbers are which, >> for future reference. >> >> I also suspect you have problems with your drive's error recovery >> control, also known as time-limited error recovery. Simple sector >> errors should *not* be kicking out your drives. Mdadm knows to >> reconstruct from parity and rewrite when a read error is encountered. >> That either succeeds directly, or causes the drive to remap. >> >> You say that the SMART tests are good, so read errors are probably >> escalating into link timeouts, and the drive ignores the attempt to >> reconstruct. *That* kicks the drive out. >> >> "smartctl -x" reports for all of your drives would help identify if >> you have this problem. You *cannot* safely run raid arrays with drives >> that don't (or won't) report errors in a timely fashion (a few seconds). >> >>> It did make me nervous though. So I decided I'd add a second disk for >>> a bit of extra redundancy, making the array a RAID 6 - the thinking >>> was the occasional disk getting kicked and re-added from a RAID 6 >>> array wouldn't present as much risk as a single disk getting kicked >>> from a RAID 5. >>> >>> So first off, I added the 6th disk as a hotspare to the RAID5 array. >>> So I now had my 5 disk RAID 5 + hotspare. >>> >>> I then found that mdadm 2.6.7 (in the repositories) isn't actually >>> capable of a 5->6 reshape. So I pulled the latest 3.2.3 sources and >>> compiled myself a new version of mdadm. >>> >>> With the newer version of mdadm, it was happy to do the reshape - so I >>> set it off on it's merry way, using an esata HD (mounted at /usb :-P) >>> for the backupfile: >>> >>> root@raven:/# mdadm --grow /dev/md0 --level=6 --raid-devices=6 >>> --backup-file=/usb/md0.backup >>> >>> It would take a week to reshape, but it was ona UPS & happily ticking >>> along. The array would be online the whole time so I was in no rush. >>> Content, I went to get some shut-eye. >>> >>> I got up this morning and took a quick look in /proc/mdstat to see how >>> things were going and saw things had failed spectacularly. At least >>> two disks had been kicked from the array and the whole thing had >>> crumbled. >> >> Do you still have the dmesg for this? >> >>> Ouch. >>> >>> I tried to assembe the array, to see if it would continue the reshape: >>> >>> root@raven:/# mdadm -Avv --backup-file=/usb/md0.backup /dev/md0 >>> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sdg1 >>> >>> Unfortunately mdadm had decided that the backup-file was out of date >>> (timestamps didn't match) and was erroring with: Failed to restore >>> critical section for reshape, sorry.. >>> >>> Chances are things were in such a mess that backup file wasn't going >>> to be used anyway, so I blocked the timestamp check with: export >>> MDADM_GROW_ALLOW_OLD=1 >>> >>> That allowed me to assemble the array, but not run it as there were >>> not enough disks to start it. >>> >>> This is the current state of the array: >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md0 : inactive sdb1[1] sdd1[5] sdf1[4] sda1[2] >>> 7814047744 blocks super 0.91 >>> >>> unused devices: <none> >>> >>> root@raven:/# mdadm --detail /dev/md0 >>> /dev/md0: >>> Version : 0.91 >>> Creation Time : Tue Jul 12 23:05:01 2011 >>> Raid Level : raid6 >>> Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) >>> Raid Devices : 6 >>> Total Devices : 4 >>> Preferred Minor : 0 >>> Persistence : Superblock is persistent >>> >>> Update Time : Tue Feb 7 09:32:29 2012 >>> State : active, FAILED, Not Started >>> Active Devices : 3 >>> Working Devices : 4 >>> Failed Devices : 0 >>> Spare Devices : 1 >>> >>> Layout : left-symmetric-6 >>> Chunk Size : 64K >>> >>> New Layout : left-symmetric >>> >>> UUID : 9a76d1bd:2aabd685:1fc5fe0e:7751cfd7 (local to host raven) >>> Events : 0.1848341 >>> >>> Number Major Minor RaidDevice State >>> 0 0 0 0 removed >>> 1 8 17 1 active sync /dev/sdb1 >>> 2 8 1 2 active sync /dev/sda1 >>> 3 0 0 3 removed >>> 4 8 81 4 active sync /dev/sdf1 >>> 5 8 49 5 spare rebuilding /dev/sdd1 >>> >>> The two removed disks: >>> [ 3020.998529] md: kicking non-fresh sdc1 from array! >>> [ 3021.012672] md: kicking non-fresh sdg1 from array! >>> >>> Attempted to re-add the disks (same for both): >>> root@raven:/# mdadm /dev/md0 --add /dev/sdg1 >>> mdadm: /dev/sdg1 reports being an active member for /dev/md0, but a >>> --re-add fails. >>> mdadm: not performing --add as that would convert /dev/sdg1 in to a spare. >>> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdg1" first. >>> >>> With a failed array the last thing we want to do is add spares and >>> trigger a resync so obviously I haven't zeroed the superblocks and >>> added yet. >> >> That would be catastrophic. >> >>> Checked and two disks really are out of sync: >>> root@raven:/# mdadm --examine /dev/sd[a-h]1 | grep Event >>> Events : 1848341 >>> Events : 1848341 >>> Events : 1848333 >>> Events : 1848341 >>> Events : 1848341 >>> Events : 1772921 >> >> So /dev/sdg1 dropped out first, and /dev/sdc1 followed and killed the >> array. >> >>> I'll post the output of --examine on all the disks below - if anyone >>> has any advice I'd really appreciate it (Neil Brown doesn't read these >>> forums does he?!?). I would usually move next to recreating the array >>> and using assume-clean but since it's right in the middle of a reshape >>> I'm not inclined to try. >> >> Neil absolutely reads this mailing list, and is likely to pitch in if >> I don't offer precisely correct advice :-) >> >> He's in an Australian time zone though, so latency might vary. I'm on the >> U.S. east coast, fwiw. >> >> In any case, with a re-shape in progress, "--create --assume-clean" is >> not an option. >> >>> Critical stuff is of course backed up, but there is some user data not >>> covered by backups that I'd like to try and restore if at all >>> possible. >> >> Hope is not all lost. If we can get your ERC adjusted, the next step >> would be to disconnect /dev/sdg from the system, and assemble with >> --force and MDADM_GROW_ALLOW_OLD=1 >> >> That'll let the reshape finish, leaving you with a single-degraded >> raid6. Then you fsck and make critical backups. Then you --zero- and >> --add /dev/sdg. >> >> If your drives don't support ERC, I can't recommend you continue until >> you've ddrescue'd your drives onto new ones that do support ERC. >> >> HTH, >> >> Phil >> >> [1] http://github.com/pturmel/lsdrv -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html