Re: force remapping a pending sector in sw raid5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 09/02/18 um 21:29 schrieb Marc MERLIN:
> On Fri, Feb 09, 2018 at 03:13:26PM -0500, Phil Turmel wrote:
>>> The pending sectors should have been re-written and become
>>> Reallocated_Event_Count, no?
>>
>> Yes, and not necessarily.  Pending sectors can be non-permanent errors
>> -- the drive firmware will test a pending sector immediately after write
>> to see if the write is readable.  If not, it will re-allocate while it
>> still has the write data in its buffers.  Otherwise, it'll clear the
>> pending sector.
> 
> This shows the sector is still bad though, right? 
> 
> myth:~# hdparm --read-sector 1287409520 /dev/sdh
> /dev/sdh:
> reading sector 1287409520: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 succeeded
> 7000 0b54 92c4 ffff 0000 0000 01fe 0000
> (...)
> 
> [ 2572.139404] ata5.04: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 2572.139419] ata5.04: failed command: READ SECTOR(S) EXT
> [ 2572.139427] ata5.04: cmd 24/00:01:70:4f:bc/00:00:4c:00:00/e0 tag 28 pio 512 in
> [ 2572.139427]          res 51/40:01:70:4f:bc/00:00:4c:00:00/e0 Emask 0x9 (media error)
> [ 2572.139431] ata5.04: status: { DRDY ERR }
> [ 2572.139435] ata5.04: error: { UNC }
> [ 2572.162369] ata5.04: configured for UDMA/133
> [ 2572.162414] ata5: EH complete
> 
> mdadm also said it found 6 bad sectors and rewrote them (or something like that)
> and it's happy. So alledgely it did something, but smart does not agree (yet?)
> 
> I'm now running a long smart test on all drives, will see if numbers change.
> 
> Mmmh, and I just ran 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> below, and I don't quite understand what's going on.
> 
>>> So, mdadm is happy allegedly, but my drives still have the same bad
>>> sectors they had (more or less).
>>
>> If you have bad block lists enabled in your array, MD will *never* try
>> to fix the underlying sectors.  Please show your mdadm -E reports for
>> these devices.  If necessary, stop the array and re-assemble with the
>> options to disable bad block lists.  { How this misfeature got into the
>> kernel and enabled by default baffles me. }
> 
> This means I dont have bad block lists?
> myth:~# mdadm -E /dev/sdd e f g h all return
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
>> Also, pending sectors that are in dead zones between metadata and array
>> data will not be accessed by a check scrub, and will therefore persist.
>  
> That's a good point, but then I would never have discovered those blocks
> while initializing the array.
> 
>>> Yes, I know I should trash (return) those drives,
>>
>> Well, non-permanent read errors are not considered warranty failures.
>> They are in the drive specs.  When pending is zero and actual
>> re-allocations are climbing (my threshold is double digits), *then* it's
>> time to replace.
> 
> I think it's worse here. Read errors are not being cleared by block rewrites?
> Those are brand "new" (but really remanufactured) drives. 
> So far I'm not liking what I'm seeing and I'm very close to just
> returning them all and getting some less dodgy ones.
> 
> Sad because the last set of 5 I got from a similar source, have worked
> beautifully.
> 
> Let's see what a full smart scan does.
> I may also use hdparm --write-sector to just fill those bad blocks with 0's
> now that it seems that mdadm isn't caring about/using them anymore?
> 
> Now, badblocks perplexes me even more. Shouldn't -n re-write blocks?
> 
> myth:~# badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
> /dev/sdh is apparently in use by the system; badblocks forced anyway.
> Checking for bad blocks in non-destructive read-write mode
> From block 1287409400 to 1287409599
> Checking for bad blocks (non-destructive read-write test)
> Testing with random pattern: 1287409520ne, 0:14 elapsed. (0/0/0 errors)
> 1287409521ne, 0:18 elapsed. (1/0/0 errors)
> 1287409522ne, 0:23 elapsed. (2/0/0 errors)
> 1287409523ne, 0:27 elapsed. (3/0/0 errors)
> 1287409524ne, 0:31 elapsed. (4/0/0 errors)
> 1287409525ne, 0:36 elapsed. (5/0/0 errors)
> 1287409526ne, 0:40 elapsed. (6/0/0 errors)
> 1287409527ne, 0:44 elapsed. (7/0/0 errors)
> done                                                 
> Pass completed, 8 bad blocks found. (8/0/0 errors)
> 
> Badblocks found 8 bad blocks, but didn't rewrite them, or failed to, or
> succeeded but that did nothing anyway?
> 
> Do I understand that
> 1) badblocks got read errors
> 2) it's supposed to rewrite the blocks with new data (or not?)
> 3) auto reallocate failed
> 
> 
> [ 3171.717001] ata5.04: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
> [ 3171.717012] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3171.717019] ata5.04: cmd 60/08:30:70:4f:bc/00:00:4c:00:00/40 tag 6 ncq dma 4096 in
> [ 3171.717019]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3171.717031] ata5.04: status: { DRDY ERR } 
> [ 3171.717034] ata5.04: error: { UNC }
> [ 3171.718293] ata5.04: configured for UDMA/133
> [ 3171.718342] sd 4:4:0:0: [sdh] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3171.718349] sd 4:4:0:0: [sdh] tag#6 Sense Key : Medium Error [current] 
> [ 3171.718354] sd 4:4:0:0: [sdh] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed 
> [ 3171.718360] sd 4:4:0:0: [sdh] tag#6 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3171.718364] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3171.718369] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3171.718393] ata5: EH complete
> [ 3176.092946] ata5.04: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x0
> [ 3176.092958] ata5.04: failed command: READ FPDMA QUEUED
> [ 3176.092973] ata5.04: cmd 60/08:b0:70:4f:bc/00:00:4c:00:00/40 tag 22 ncq dma 4096 in 
> [ 3176.092973]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3176.092978] ata5.04: status: { DRDY ERR }
> [ 3176.092981] ata5.04: error: { UNC } 
> [ 3176.094237] ata5.04: configured for UDMA/133
> [ 3176.094285] sd 4:4:0:0: [sdh] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
> [ 3176.094291] sd 4:4:0:0: [sdh] tag#22 Sense Key : Medium Error [current] 
> [ 3176.094296] sd 4:4:0:0: [sdh] tag#22 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3176.094302] sd 4:4:0:0: [sdh] tag#22 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00 
> [ 3176.094306] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3176.094310] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3176.094324] ata5: EH complete
> [ 3180.488899] ata5.04: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
> [ 3180.488909] ata5.04: failed command: READ FPDMA QUEUED 
> [ 3180.488916] ata5.04: cmd 60/08:40:70:4f:bc/00:00:4c:00:00/40 tag 8 ncq dma 4096 in
> [ 3180.488916]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F> 
> [ 3180.488928] ata5.04: status: { DRDY ERR }
> [ 3180.488931] ata5.04: error: { UNC }
> [ 3180.490193] ata5.04: configured for UDMA/133
> [ 3180.490243] sd 4:4:0:0: [sdh] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3180.490249] sd 4:4:0:0: [sdh] tag#8 Sense Key : Medium Error [current]  
> [ 3180.490254] sd 4:4:0:0: [sdh] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3180.490259] sd 4:4:0:0: [sdh] tag#8 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3180.490263] print_req_error: I/O error, dev sdh, sector 1287409520 
> [ 3180.490268] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3180.490290] ata5: EH complete 
> [ 3184.873146] ata5.04: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
> [ 3184.873161] ata5.04: failed command: READ FPDMA QUEUED
> [ 3184.873175] ata5.04: cmd 60/08:c0:70:4f:bc/00:00:4c:00:00/40 tag 24 ncq dma 4096 in 
> [ 3184.873175]          res 41/40:00:70:4f:bc/00:00:4c:00:00/00 Emask 0x409 (media error) <F>
> [ 3184.873181] ata5.04: status: { DRDY ERR }
> [ 3184.873184] ata5.04: error: { UNC }
> [ 3184.874437] ata5.04: configured for UDMA/133
> [ 3184.874488] sd 4:4:0:0: [sdh] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3184.874495] sd 4:4:0:0: [sdh] tag#24 Sense Key : Medium Error [current] 
> [ 3184.874500] sd 4:4:0:0: [sdh] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
> [ 3184.874506] sd 4:4:0:0: [sdh] tag#24 CDB: Read(16) 88 00 00 00 00 00 4c bc 4f 70 00 00 00 08 00 00
> [ 3184.874510] print_req_error: I/O error, dev sdh, sector 1287409520
> [ 3184.874515] Buffer I/O error on dev sdh, logical block 160926190, async page read
> [ 3184.874555] ata5: EH complete
> 

What you write about the result of
badblocks -fsvnb512 /dev/sdh 1287409599 1287409400
is the expected behavior. -n means that it will _not_ write sectors that
it cannot read (because that would remove the possibility that data from
these sectors could be recovered by more tries).

As I wrote, you have to use the -w option instead of -n, and use x and y
of 1287409527 1287409520

HTH
Kay


<<attachment: smime.p7s>>


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux