Thomas Rosenstein <thomas.rosenstein@xxxxxxxxxxxxxxxx> 于2020年10月22日周四 下午12:28写道: > > Hello, > > I'm trying todo something interesting, the structure looks like this: > > xfs > - mdraid > - multipath (with no_path_queue = fail) > - iscsi path 1 > - iscsi path 2 > - multipath (with no_path_queue = fail) > - iscsi path 1 > - iscsi path 2 > > During normal operation everything looks good, once a path fails (i.e. > iscsi target is removed), the array goes to degraded, if the path comes > back nothing happens. > > Q1) Can I enable auto recovery for failed devices? > > If the device is readded manually (or by software) everything resyncs > and it works again. As all should be. > > If BOTH devices fail at the same time (worst case scenario) it gets > wonky. I would expect a total hang (as with iscsi, and multipath > queue_no_path) > > 1) XFS reports Input/Output error > 2) dmesg has logs like: > > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41472, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41473, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41474, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41475, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41476, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41477, async page read > [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block > 41478, async page read > > 3) mdadm --detail /dev/md127 shows: > > /dev/md127: > Version : 1.2 > Creation Time : Wed Oct 21 17:25:22 2020 > Raid Level : raid1 > Array Size : 96640 (94.38 MiB 98.96 MB) > Used Dev Size : 96640 (94.38 MiB 98.96 MB) > Raid Devices : 2 > Total Devices : 2 > Persistence : Superblock is persistent > > Update Time : Thu Oct 22 09:23:35 2020 > State : clean, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 1 > Spare Devices : 0 > > Consistency Policy : resync > > Name : v-b08c6663-7296-4c66-9faf-ac687 > UUID : cc282a5c:59a499b3:682f5e6f:36f9c490 > Events : 122 > > Number Major Minor RaidDevice State > 0 253 2 0 active sync /dev/dm-2 > - 0 0 1 removed > > 1 253 3 - faulty /dev/dm- > > 4) I can read from /dev/md127, but only however much is in the buffer > (see above dmesg logs) > > > In my opinion this should happen, or at least should be configurable. > I expect: > 1) XFS hangs indefinitly (like multipath queue_no_path) > 2) mdadm shows FAULTED as State > > Q2) Can this be configured in any way? you can enable the last device to fail 9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10.") > > After BOTH paths are recovered, nothing works anymore, and the raid > doesn't recover automatically. > Only a complete unmount and stop followed by an assemble and mount makes > the raid function again. > > Q3) Is that expected behavior? > > Thanks > Thomas Rosenstein