Re: Does a "check" of a RAID6 actually read all disks in a stripe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/4/20 11:40 pm, Piergiorgio Sartor wrote:

I suspect, but Neil or some expert should confim
or deny, that a check on a RAID-6 uses only the
P parity to verify stripe consistency.
If there are errors in the Q parity chunk, these
will not be found.


I have been able to experimentally verify that a check does indeed read both p & q and perform validation on them for mismatch.

I did find a couple of cases where a bad sector was not picked up during a check, however these appeared to be inconsistent and not reproducible. I suspect it is related to using a small part of the array and caching at the block layer. Once I removed the sync_max restriction and let the array run the check until I knew the data read was greater than the machine RAM (ie, it was all out of cache) I could not reproduce the issue. Incidentally, no use of drop_caches helped here, I had to let it just run.

I performed these tests using hdparm to create a bad sector on one of the array members and then forcing a check. In all sane cases, the check hit the read error, performed a reconstruction and re-wrote.

Oddly enough, 9 times out of 10, on the P block it simply corrected the read error (ie 1 bad "sector" on the disk causes 8 sectors to return errors due to the 4K formatting) while the Q block re-wrote the entire chunk every time (128 sectors).

[248861.618254] ata3: EH complete
[248861.627940] raid5_end_read_request: 7 callbacks suppressed
[248861.627964] md/raid:md3: read error corrected (8 sectors at 262144 on sdc)
[248877.770056] md: md3: data-check interrupted.

There were instances however where performing exactly the same test on the P block would force an entire chunk write. Apparently inconsistent behaviour but always resulting in the correct on-disk result.

[249390.287825] ata3: EH complete
[249390.570722] raid5_end_read_request: 6 callbacks suppressed
[249390.570752] md/raid:md3: read error corrected (8 sectors at 262272 on sdc) [249390.570776] md/raid:md3: read error corrected (8 sectors at 262280 on sdc) [249390.570800] md/raid:md3: read error corrected (8 sectors at 262288 on sdc) [249390.570822] md/raid:md3: read error corrected (8 sectors at 262296 on sdc) [249390.570845] md/raid:md3: read error corrected (8 sectors at 262304 on sdc) [249390.570871] md/raid:md3: read error corrected (8 sectors at 262312 on sdc) [249390.570893] md/raid:md3: read error corrected (8 sectors at 262320 on sdc) [249390.570916] md/raid:md3: read error corrected (8 sectors at 262328 on sdc) [249390.570940] md/raid:md3: read error corrected (8 sectors at 262336 on sdc) [249390.570962] md/raid:md3: read error corrected (8 sectors at 262344 on sdc)
[249397.549844] md: md3: data-check interrupted.

This test did validate the theory I had that using dd-rescue to clone a dying drive and then using hdparm to mark the unrecoverable sectors as bad on the clone would prevent md from reading corrupt data from the clone and allowing a rebuild on that stripe.

For completeness, the test was performed with many variants along these lines :

test:~# mdadm --assemble /dev/md3
mdadm: /dev/md3 has been started with 9 drives.
test:~# hdparm --yes-i-know-what-i-am-doing --make-bad-sector f262144 /dev/sdm

/dev/sdm:
Corrupting sector 262144 (WRITE_UNC_EXT as flagged): succeeded
test:~# export MD=/sys/block/md3/md/ ; echo 256 > $MD/sync_max ; echo check > $MD/sync_action test:~# export MD=/sys/block/md3/md/ ; echo idle > $MD/sync_action ; echo 0 > $MD/sync_min
test:~# mdadm --stop /dev/md3

[249036.634236] md: md3 stopped.
[249036.700205] md/raid:md3: device sde operational as raid disk 0
[249036.700227] md/raid:md3: device sdm operational as raid disk 8
[249036.700246] md/raid:md3: device sdc operational as raid disk 7
[249036.700264] md/raid:md3: device sdf operational as raid disk 6
[249036.700283] md/raid:md3: device sdk operational as raid disk 5
[249036.700301] md/raid:md3: device sdj operational as raid disk 4
[249036.700319] md/raid:md3: device sdl operational as raid disk 3
[249036.700338] md/raid:md3: device sdi operational as raid disk 2
[249036.700356] md/raid:md3: device sdg operational as raid disk 1
[249036.700832] md/raid:md3: raid level 6 active with 9 out of 9 devices, algorithm 2
[249036.720763] md3: detected capacity change from 0 to 14001852841984
[249056.033360] md: data-check of RAID array md3
[249056.154104] sd 2:0:7:0: [sdm] tag#246 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[249056.154136] sd 2:0:7:0: [sdm] tag#246 Sense Key : 0x3 [current]
[249056.154157] sd 2:0:7:0: [sdm] tag#246 ASC=0x11 ASCQ=0x0
[249056.154178] sd 2:0:7:0: [sdm] tag#246 CDB: opcode=0x28 28 00 00 04 00 00 00 00 80 00 [249056.154206] blk_update_request: critical medium error, dev sdm, sector 262144 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0 [249056.155056] md/raid:md3: read error corrected (8 sectors at 262144 on sdm) [249056.155081] md/raid:md3: read error corrected (8 sectors at 262152 on sdm) [249056.155107] md/raid:md3: read error corrected (8 sectors at 262160 on sdm) [249056.155131] md/raid:md3: read error corrected (8 sectors at 262168 on sdm) [249056.155157] md/raid:md3: read error corrected (8 sectors at 262176 on sdm) [249056.155181] md/raid:md3: read error corrected (8 sectors at 262184 on sdm) [249056.155207] md/raid:md3: read error corrected (8 sectors at 262192 on sdm) [249056.155232] md/raid:md3: read error corrected (8 sectors at 262200 on sdm) [249056.155258] md/raid:md3: read error corrected (8 sectors at 262208 on sdm) [249056.155282] md/raid:md3: read error corrected (8 sectors at 262216 on sdm)
[249064.025753] md: md3: data-check interrupted.
[249064.190722] md3: detected capacity change from 14001852841984 to 0
[249064.190756] md: md3 stopped.

With these test results I'm at a loss as to explaining how repeated full disk checks missed the bad sectors on the drive previously, but it certainly happened.

Regards,
Brad



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux