mdraid: raid1 and iscsi-multipath devices - never faults but should!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm trying todo something interesting, the structure looks like this:

xfs
- mdraid
  - multipath (with no_path_queue = fail)
    - iscsi path 1
    - iscsi path 2
  - multipath (with no_path_queue = fail)
    - iscsi path 1
    - iscsi path 2

During normal operation everything looks good, once a path fails (i.e. iscsi target is removed), the array goes to degraded, if the path comes back nothing happens.

Q1) Can I enable auto recovery for failed devices?

If the device is readded manually (or by software) everything resyncs and it works again. As all should be.

If BOTH devices fail at the same time (worst case scenario) it gets wonky. I would expect a total hang (as with iscsi, and multipath queue_no_path)

1) XFS reports Input/Output error
2) dmesg has logs like:

[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41472, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41473, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41474, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41475, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41476, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41477, async page read [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 41478, async page read

3) mdadm --detail /dev/md127 shows:

/dev/md127:
           Version : 1.2
     Creation Time : Wed Oct 21 17:25:22 2020
        Raid Level : raid1
        Array Size : 96640 (94.38 MiB 98.96 MB)
     Used Dev Size : 96640 (94.38 MiB 98.96 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Thu Oct 22 09:23:35 2020
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 1
     Spare Devices : 0

Consistency Policy : resync

              Name : v-b08c6663-7296-4c66-9faf-ac687
              UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
            Events : 122

    Number   Major   Minor   RaidDevice State
       0     253        2        0      active sync   /dev/dm-2
       -       0        0        1      removed

       1     253        3        -      faulty   /dev/dm-

4) I can read from /dev/md127, but only however much is in the buffer (see above dmesg logs)


In my opinion this should happen, or at least should be configurable.
I expect:
1) XFS hangs indefinitly (like multipath queue_no_path)
2) mdadm shows FAULTED as State

Q2) Can this be configured in any way?

After BOTH paths are recovered, nothing works anymore, and the raid doesn't recover automatically. Only a complete unmount and stop followed by an assemble and mount makes the raid function again.

Q3) Is that expected behavior?

Thanks
Thomas Rosenstein



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux