Hello,
I'm trying todo something interesting, the structure looks like this:
xfs
- mdraid
- multipath (with no_path_queue = fail)
- iscsi path 1
- iscsi path 2
- multipath (with no_path_queue = fail)
- iscsi path 1
- iscsi path 2
During normal operation everything looks good, once a path fails (i.e.
iscsi target is removed), the array goes to degraded, if the path comes
back nothing happens.
Q1) Can I enable auto recovery for failed devices?
If the device is readded manually (or by software) everything resyncs
and it works again. As all should be.
If BOTH devices fail at the same time (worst case scenario) it gets
wonky. I would expect a total hang (as with iscsi, and multipath
queue_no_path)
1) XFS reports Input/Output error
2) dmesg has logs like:
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41472, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41473, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41474, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41475, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41476, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41477, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41478, async page read
3) mdadm --detail /dev/md127 shows:
/dev/md127:
Version : 1.2
Creation Time : Wed Oct 21 17:25:22 2020
Raid Level : raid1
Array Size : 96640 (94.38 MiB 98.96 MB)
Used Dev Size : 96640 (94.38 MiB 98.96 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Oct 22 09:23:35 2020
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Consistency Policy : resync
Name : v-b08c6663-7296-4c66-9faf-ac687
UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
Events : 122
Number Major Minor RaidDevice State
0 253 2 0 active sync /dev/dm-2
- 0 0 1 removed
1 253 3 - faulty /dev/dm-
4) I can read from /dev/md127, but only however much is in the buffer
(see above dmesg logs)
In my opinion this should happen, or at least should be configurable.
I expect:
1) XFS hangs indefinitly (like multipath queue_no_path)
2) mdadm shows FAULTED as State
Q2) Can this be configured in any way?
After BOTH paths are recovered, nothing works anymore, and the raid
doesn't recover automatically.
Only a complete unmount and stop followed by an assemble and mount makes
the raid function again.
Q3) Is that expected behavior?
Thanks
Thomas Rosenstein