Re: 3-way mirrors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 08 Sep 2010 06:16:16 +0000
"Michael Sallaway" <michael@xxxxxxxxxxxx> wrote:

> 
> >  -------Original Message-------
> >  From: Neil Brown <neilb@xxxxxxx>
> >  To: Michael Sallaway <michael@xxxxxxxxxxxx>
> >  Cc: linux-raid@xxxxxxxxxxxxxxx
> >  Subject: Re: 3-way mirrors
> >  Sent: 08 Sep '10 06:02
> >  
> >  Hmm.... Drive B shouldn't be ejected from the array for a read error.  md
> >  should calculate the data for both A and B from the other devices and then
> >  write that to A and B.
> >  If the write fails, only then should it kick B from the array.  Is that what
> >  is happening?
> >  
> >  i.e. do you see messages like:
> >     read error corrected
> >     read error not correctable
> >     read error NOT corrected
> >  
> >  in the kernel logs??
> 
> 
> The logs for the relevant section are below, at the bottom -- it's a "read error not correctable". So I'm guessing it's also failing a write, although I can't see the ATA error handling mentioning any writes -- it all looks like reads??

Yes, it is just reads.
It looks like you have an ancient kernel - older than April 2010 :-)
A patch went in to 2.6.35 and I think some 2.6.34.y which fixed a bug that
causes md to drop devices in a degraded RAID6 when it could have fixed the
read error.  Commit 7b0bb5368a719

So a newer kernel might fix your problem for you.

> 
> 
> >  If the write is failing, then you want my bad-block-log patches - only they
> >  aren't really finished yet and certainly aren't tested very well.  I really
> >  should get back to those.
> 
> Interesting -- I'm not familiar with them, where would I find these patches? And what would they do -- just allow the bad blocks (even on writes), and keep the drive in the array? That's all I'm really after, in this case, I think.

I posted them to the list for review a few months ago and haven't got back to
them.

http://www.spinics.net/lists/raid/msg28813.html

I wouldn't recommend using them until they've seen more review and testing.

NeilBrown



> 
> Thanks!
> Michael
> 
> 
> 
> Syslog from the failure of the first drive:
> 
> Sep  7 09:31:24 lechuck kernel: [51912.039892] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:24 lechuck kernel: [51912.048227] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:24 lechuck kernel: [51912.056685] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:24 lechuck kernel: [51912.065055] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep  7 09:31:24 lechuck kernel: [51912.065061]          res 51/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:25 lechuck kernel: [51912.098113] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:25 lechuck kernel: [51912.106705] ata13.00: error: { UNC }
> Sep  7 09:31:25 lechuck kernel: [51912.128027] ata13.00: configured for UDMA/133
> Sep  7 09:31:25 lechuck kernel: [51912.128054] ata13: EH complete
> Sep  7 09:31:28 lechuck kernel: [51915.216232] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:28 lechuck kernel: [51915.224757] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:28 lechuck kernel: [51915.233283] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:28 lechuck kernel: [51915.241660] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep  7 09:31:28 lechuck kernel: [51915.241662]          res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:28 lechuck kernel: [51915.275603] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:28 lechuck kernel: [51915.284267] ata13.00: error: { UNC }
> Sep  7 09:31:28 lechuck kernel: [51915.305722] ata13.00: configured for UDMA/133
> Sep  7 09:31:28 lechuck kernel: [51915.305746] ata13: EH complete
> Sep  7 09:31:30 lechuck kernel: [51917.992164] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:30 lechuck kernel: [51918.000791] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:30 lechuck kernel: [51918.009631] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:30 lechuck kernel: [51918.018303] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep  7 09:31:30 lechuck kernel: [51918.018305]          res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:30 lechuck kernel: [51918.054117] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:30 lechuck kernel: [51918.062808] ata13.00: error: { UNC }
> Sep  7 09:31:30 lechuck kernel: [51918.084521] ata13.00: configured for UDMA/133
> Sep  7 09:31:30 lechuck kernel: [51918.084547] ata13: EH complete
> Sep  7 09:31:33 lechuck kernel: [51920.956122] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:33 lechuck kernel: [51920.964858] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:33 lechuck kernel: [51920.973829] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:33 lechuck kernel: [51920.982587] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep  7 09:31:33 lechuck kernel: [51920.982589]          res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:33 lechuck kernel: [51921.017401] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:33 lechuck kernel: [51921.026134] ata13.00: error: { UNC }
> Sep  7 09:31:33 lechuck kernel: [51921.048656] ata13.00: configured for UDMA/133
> Sep  7 09:31:33 lechuck kernel: [51921.048680] ata13: EH complete
> Sep  7 09:31:37 lechuck kernel: [51924.153414] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:37 lechuck kernel: [51924.162178] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:37 lechuck kernel: [51924.162182] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:37 lechuck kernel: [51924.162189] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep  7 09:31:37 lechuck kernel: [51924.162190]          res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:37 lechuck kernel: [51924.162193] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:37 lechuck kernel: [51924.162195] ata13.00: error: { UNC }
> Sep  7 09:31:37 lechuck kernel: [51924.175348] ata13.00: configured for UDMA/133
> Sep  7 09:31:37 lechuck kernel: [51924.175374] ata13: EH complete
> Sep  7 09:31:39 lechuck kernel: [51927.005666] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep  7 09:31:39 lechuck kernel: [51927.014384] ata13.00: irq_stat 0x40000008
> Sep  7 09:31:39 lechuck kernel: [51927.023299] ata13.00: failed command: READ FPDMA QUEUED
> Sep  7 09:31:39 lechuck kernel: [51927.031949] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep  7 09:31:39 lechuck kernel: [51927.031951]          res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep  7 09:31:39 lechuck kernel: [51927.066322] ata13.00: status: { DRDY ERR }
> Sep  7 09:31:39 lechuck kernel: [51927.074946] ata13.00: error: { UNC }
> Sep  7 09:31:40 lechuck kernel: [51927.096349] ata13.00: configured for UDMA/133
> Sep  7 09:31:40 lechuck kernel: [51927.096393] sd 12:0:0:0: [sdm] Unhandled sense code
> Sep  7 09:31:40 lechuck kernel: [51927.096396] sd 12:0:0:0: [sdm] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Sep  7 09:31:40 lechuck kernel: [51927.096401] sd 12:0:0:0: [sdm] Sense Key : Medium Error [current] [descriptor]
> Sep  7 09:31:40 lechuck kernel: [51927.096406] Descriptor sense data with sense descriptors (in hex):
> Sep  7 09:31:40 lechuck kernel: [51927.096409]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> Sep  7 09:31:40 lechuck kernel: [51927.096420]         5d d9 20 a3
> Sep  7 09:31:40 lechuck kernel: [51927.096425] sd 12:0:0:0: [sdm] Add. Sense: Unrecovered read error - auto reallocate failed
> Sep  7 09:31:40 lechuck kernel: [51927.096431] sd 12:0:0:0: [sdm] CDB: Read(10): 28 00 5d d9 20 00 00 00 d8 00
> Sep  7 09:31:40 lechuck kernel: [51927.096442] end_request: I/O error, dev sdm, sector 1574510755
> Sep  7 09:31:40 lechuck kernel: [51927.104975] raid5:md10: read error not correctable (sector 1574510752 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.104985] raid5: Disk failure on sdm, disabling device.
> Sep  7 09:31:40 lechuck kernel: [51927.104989] raid5: Operation continuing on 10 devices.
> Sep  7 09:31:40 lechuck kernel: [51927.122210] raid5:md10: read error not correctable (sector 1574510760 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122214] raid5:md10: read error not correctable (sector 1574510768 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122218] raid5:md10: read error not correctable (sector 1574510776 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122222] raid5:md10: read error not correctable (sector 1574510784 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122225] raid5:md10: read error not correctable (sector 1574510792 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122229] raid5:md10: read error not correctable (sector 1574510800 on sdm).
> Sep  7 09:31:40 lechuck kernel: [51927.122242] ata13: EH complete
> Sep  7 09:31:40 lechuck kernel: [51927.142926] md: md10: recovery done.
> Sep  7 09:31:40 lechuck mdadm[3840]: Fail event detected on md device /dev/md10, component device /dev/sdm
> Sep  7 09:31:40 lechuck kernel: [51927.344026] RAID5 conf printout:
> Sep  7 09:31:40 lechuck kernel: [51927.344031]  --- rd:12 wd:10
> Sep  7 09:31:40 lechuck kernel: [51927.344034]  disk 0, o:1, dev:sdf
> Sep  7 09:31:40 lechuck kernel: [51927.344037]  disk 1, o:1, dev:sdb
> Sep  7 09:31:40 lechuck kernel: [51927.344039]  disk 2, o:1, dev:sda
> Sep  7 09:31:40 lechuck kernel: [51927.344042]  disk 3, o:1, dev:sdc
> Sep  7 09:31:40 lechuck kernel: [51927.344044]  disk 4, o:1, dev:sdj
> Sep  7 09:31:40 lechuck kernel: [51927.344047]  disk 5, o:1, dev:sdi
> Sep  7 09:31:40 lechuck kernel: [51927.344049]  disk 6, o:1, dev:sdp
> Sep  7 09:31:40 lechuck kernel: [51927.344052]  disk 7, o:1, dev:sdn
> Sep  7 09:31:40 lechuck kernel: [51927.344054]  disk 8, o:1, dev:sdo
> Sep  7 09:31:40 lechuck kernel: [51927.344057]  disk 9, o:0, dev:sdm
> Sep  7 09:31:40 lechuck kernel: [51927.344059]  disk 10, o:1, dev:sdk
> Sep  7 09:31:40 lechuck kernel: [51927.344062]  disk 11, o:1, dev:sdl
> Sep  7 09:31:40 lechuck kernel: [51927.344064] RAID5 conf printout:
> Sep  7 09:31:40 lechuck kernel: [51927.344066]  --- rd:12 wd:10
> Sep  7 09:31:40 lechuck kernel: [51927.344068]  disk 0, o:1, dev:sdf
> Sep  7 09:31:40 lechuck kernel: [51927.344070]  disk 1, o:1, dev:sdb
> Sep  7 09:31:40 lechuck kernel: [51927.344073]  disk 2, o:1, dev:sda
> Sep  7 09:31:40 lechuck kernel: [51927.344075]  disk 3, o:1, dev:sdc
> Sep  7 09:31:40 lechuck kernel: [51927.344077]  disk 4, o:1, dev:sdj
> Sep  7 09:31:40 lechuck kernel: [51927.344080]  disk 5, o:1, dev:sdi
> Sep  7 09:31:40 lechuck kernel: [51927.344082]  disk 6, o:1, dev:sdp
> Sep  7 09:31:40 lechuck kernel: [51927.344084]  disk 7, o:1, dev:sdn
> Sep  7 09:31:40 lechuck kernel: [51927.344087]  disk 8, o:1, dev:sdo
> Sep  7 09:31:40 lechuck kernel: [51927.344089]  disk 9, o:0, dev:sdm
> Sep  7 09:31:40 lechuck kernel: [51927.344091]  disk 10, o:1, dev:sdk
> Sep  7 09:31:40 lechuck kernel: [51927.344093]  disk 11, o:1, dev:sdl
> Sep  7 09:31:40 lechuck kernel: [51927.344095] RAID5 conf printout:
> Sep  7 09:31:40 lechuck kernel: [51927.344097]  --- rd:12 wd:10
> Sep  7 09:31:40 lechuck kernel: [51927.344100]  disk 0, o:1, dev:sdf
> Sep  7 09:31:40 lechuck kernel: [51927.344102]  disk 1, o:1, dev:sdb
> Sep  7 09:31:40 lechuck kernel: [51927.344104]  disk 2, o:1, dev:sda
> Sep  7 09:31:40 lechuck kernel: [51927.344106]  disk 3, o:1, dev:sdc
> Sep  7 09:31:40 lechuck kernel: [51927.344109]  disk 4, o:1, dev:sdj
> Sep  7 09:31:40 lechuck kernel: [51927.344111]  disk 5, o:1, dev:sdi
> Sep  7 09:31:40 lechuck kernel: [51927.344113]  disk 6, o:1, dev:sdp
> Sep  7 09:31:40 lechuck kernel: [51927.344116]  disk 7, o:1, dev:sdn
> Sep  7 09:31:40 lechuck kernel: [51927.344118]  disk 8, o:1, dev:sdo
> Sep  7 09:31:40 lechuck kernel: [51927.344120]  disk 9, o:0, dev:sdm
> Sep  7 09:31:40 lechuck kernel: [51927.344122]  disk 10, o:1, dev:sdk
> Sep  7 09:31:40 lechuck kernel: [51927.344125]  disk 11, o:1, dev:sdl
> Sep  7 09:31:40 lechuck kernel: [51927.400014] RAID5 conf printout:
> Sep  7 09:31:40 lechuck kernel: [51927.400017]  --- rd:12 wd:10
> Sep  7 09:31:40 lechuck kernel: [51927.400020]  disk 0, o:1, dev:sdf
> Sep  7 09:31:40 lechuck kernel: [51927.400022]  disk 1, o:1, dev:sdb
> Sep  7 09:31:40 lechuck kernel: [51927.400025]  disk 2, o:1, dev:sda
> Sep  7 09:31:40 lechuck kernel: [51927.400027]  disk 3, o:1, dev:sdc
> Sep  7 09:31:40 lechuck kernel: [51927.400029]  disk 4, o:1, dev:sdj
> Sep  7 09:31:40 lechuck kernel: [51927.400032]  disk 5, o:1, dev:sdi
> Sep  7 09:31:40 lechuck kernel: [51927.400034]  disk 6, o:1, dev:sdp
> Sep  7 09:31:40 lechuck kernel: [51927.400036]  disk 7, o:1, dev:sdn
> Sep  7 09:31:40 lechuck kernel: [51927.400039]  disk 8, o:1, dev:sdo
> Sep  7 09:31:40 lechuck kernel: [51927.400041]  disk 10, o:1, dev:sdk
> Sep  7 09:31:40 lechuck kernel: [51927.400043]  disk 11, o:1, dev:sdl
> Sep  7 09:31:40 lechuck kernel: [51927.400138] md: recovery of RAID array md10
> Sep  7 09:31:40 lechuck kernel: [51927.400141] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Sep  7 09:31:40 lechuck kernel: [51927.400145] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Sep  7 09:31:40 lechuck kernel: [51927.400155] md: using 128k window, over a total of 1465138496 blocks.
> Sep  7 09:31:40 lechuck kernel: [51927.400159] md: resuming recovery of md10 from checkpoint.
> Sep  7 09:31:40 lechuck mdadm[3840]: RebuildFinished event detected on md device /dev/md10, component device  mismatches found: 477544
> Sep  7 09:31:40 lechuck mdadm[3840]: RebuildStarted event detected on md device /dev/md10
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux