Hi Mariusz,
On 2/22/22 10:18 PM, Mariusz Tkaczyk wrote:
-static int has_failed(struct r5conf *conf)
+static bool has_failed(struct r5conf *conf)
{
- int degraded;
+ int degraded = conf->mddev->degraded;
- if (conf->mddev->reshape_position == MaxSector)
- return conf->mddev->degraded > conf->max_degraded;
+ if (test_bit(MD_BROKEN, &conf->mddev->flags))
+ return true;
If one member disk was set Faulty which caused BROKEN was set, is it
possible to re-add the same member disk again?
Is possible to re-add drive to failed raid5 array now? From my
understanding of raid5_add_disk it is not possible.
I mean the below steps, it works as you can see.
[root@vm ~]# echo faulty > /sys/block/md0/md/dev-loop1/state
[root@vm ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop2[2] loop1[0](F)
1046528 blocks super 1.2 level 5, 512k chunk, algorithm 2
[2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
[root@vm ~]# echo re-add > /sys/block/md0/md/dev-loop1/state
[root@vm ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop2[2] loop1[0]
1046528 blocks super 1.2 level 5, 512k chunk, algorithm 2
[2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>
And have you run mdadm test against the series?
I run imsm test suite and our internal IMSM scope. I will take the
challenge and will verify with native. Thanks for suggestion.
Cool, thank you.
BTW, I know the mdadm test suite is kind of broke, at least this one
which I aware.
https://lore.kernel.org/all/20220119055501.GD27703@xsang-OptiPlex-9020/
And given the complexity of md, the more we test, the less bug we can
avoid.
- degraded = raid5_calc_degraded(conf);
- if (degraded > conf->max_degraded)
- return 1;
- return 0;
+ if (conf->mddev->reshape_position != MaxSector)
+ degraded = raid5_calc_degraded(conf);
+
+ if (degraded > conf->max_degraded) {
+ set_bit(MD_BROKEN, &conf->mddev->flags);
Why not set BROKEN flags in err handler to align with other levels? Or
do it in md_error only.
https://lore.kernel.org/linux-raid/3da9324e-01e7-2a07-4bcd-14245db56693@xxxxxxxxx/
You suggested that.
Other levels doesn't have dedicates has_failed() routines. For raid5 it
is reasonable to set it in has_failed().
When has_failed returns true which means MD_BROKEN should be set, if so,
then it makes sense to set it in raid5_error.
I can't do that in md_error because I don't have such information in
all cases. !test_bit("Faulty", rdev->flags) result varies.
Fair enough.
Thanks,
Guoqing