Does a "check" of a RAID6 actually read all disks in a stripe?

Brad Campbell <lists2009@xxxxxxxxxxxxxxx> · Tue, 28 Apr 2020 14:47:04 +0800

G'day all,

I have a test server with some old disks I use for beating up on. Bear in mind the disks are old and dicey which is *why* they live in a test server. I'm not after reliability, I'm more interested in finding corner cases.

One disk has a persistent read error (pending sector). This can be identified easily with dd on a specific or whole disk basis.

The array has 9 2TB drives in a RAID6 :

md3 : active raid6 sdh[12] sdm[8] sdc[10] sde[6] sdj[9] sdk[4] sdl[11] sdg[13]
      13673684416 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/8] [UU_UUUUUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

Ignore the missing disk, it's out right now being secure erased, but it was in for the tests.

The read error is on sdj, about 23G into the disk :

[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 Sense Key : 0x3 [current]
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 ASC=0x11 ASCQ=0x0
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 CDB: opcode=0x28 28 00 03 39 d8 08 00 20 00 00
[Sun Apr 26 15:05:30 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 37 prio class 0

Trigger a "check" :
[Mon Apr 27 18:51:15 2020] md: data-check of RAID array md3
[Tue Apr 28 03:42:21 2020] md: md3: data-check done.

Just to be sure it's still there :
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 Sense Key : 0x3 [current]
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 ASC=0x11 ASCQ=0x0
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00
[Tue Apr 28 14:13:33 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

So I can read from the disk with dd and trigger a read error each and every time, but a RAID6 "check" appears to skip over it without triggering the read error.

For completeness, the complete log :
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 Sense Key : 0x3 [current]
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 ASC=0x11 ASCQ=0x0
[Sun Apr 26 15:05:30 2020] sd 4:0:4:0: [sdj] tag#229 CDB: opcode=0x28 28 00 03 39 d8 08 00 20 00 00
[Sun Apr 26 15:05:30 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 37 prio class 0
[Sun Apr 26 21:15:47 2020]  sdd: sdd1 sdd2
[Mon Apr 27 18:51:15 2020] md: data-check of RAID array md3
[Tue Apr 28 03:42:21 2020] md: md3: data-check done.
[Tue Apr 28 09:39:18 2020] md/raid:md3: Disk failure on sdi, disabling device.
                           md/raid:md3: Operation continuing on 8 devices.
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 Sense Key : 0x3 [current]
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 ASC=0x11 ASCQ=0x0
[Tue Apr 28 14:13:33 2020] sd 4:0:4:0: [sdj] tag#100 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00
[Tue Apr 28 14:13:33 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 Sense Key : 0x3 [current]
[Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 ASC=0x11 ASCQ=0x0
[Tue Apr 28 14:13:35 2020] sd 4:0:4:0: [sdj] tag#112 CDB: opcode=0x28 28 00 03 39 e6 10 00 00 08 00
[Tue Apr 28 14:13:35 2020] blk_update_request: critical medium error, dev sdj, sector 54126096 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[Tue Apr 28 14:13:35 2020] Buffer I/O error on dev sdj, logical block 6765762, async page read

Examine on the suspect disk :

test:/home/brad# mdadm --examine /dev/sdj
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : dbbca7b5:327751b1:895f8f11:443f6ecb
           Name : test:3  (local to host test)
  Creation Time : Wed Nov 29 10:46:21 2017
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 13673684416 (13040.24 GiB 14001.85 GB)
  Used Dev Size : 3906766976 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=48 sectors
          State : clean
    Device UUID : f1a39d9b:fe217c62:26b065e3:0f859afd

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Apr 28 09:39:23 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cb44256b - correct
         Events : 177156

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 5
   Array State : AA.AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

test:/home/brad# mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Wed Nov 29 10:46:21 2017
     Raid Level : raid6
     Array Size : 13673684416 (13040.24 GiB 14001.85 GB)
  Used Dev Size : 1953383488 (1862.89 GiB 2000.26 GB)
   Raid Devices : 9
  Total Devices : 8
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Apr 28 09:39:23 2020
          State : clean, degraded
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : test:3  (local to host test)
           UUID : dbbca7b5:327751b1:895f8f11:443f6ecb
         Events : 177156

    Number   Major   Minor   RaidDevice State
      12       8      112        0      active sync   /dev/sdh
      13       8       96        1      active sync   /dev/sdg
       4       0        0        4      removed
      11       8      176        3      active sync   /dev/sdl
       4       8      160        4      active sync   /dev/sdk
       9       8      144        5      active sync   /dev/sdj
       6       8       64        6      active sync   /dev/sde
      10       8       32        7      active sync   /dev/sdc
       8       8      192        8      active sync   /dev/sdm

test:/home/brad# uname -a
Linux test 5.4.11 #49 SMP Wed Jan 15 11:23:38 AWST 2020 x86_64 GNU/Linux

So the read error is well into the array member, yet a "check" doesn't hit it. Does that sound right?
These disks grow bad sectors not infrequently, and so a check quite often forces a repair on a block of 8 sectors, but it has persistently missed this one.

Regards,
Brad