> On Aug 21, 2019, at 12:10 PM, Guilherme G. Piccoli <gpiccoli@xxxxxxxxxxxxx> wrote: > > On 21/08/2019 13:14, Song Liu wrote: >> [...] >> >> What do you mean by "not clear MD_BROKEN"? Do you mean we need to restart >> the array? >> >> IOW, the following won't work: >> >> mdadm --fail /dev/md0 /dev/sdx >> mdadm --remove /dev/md0 /dev/sdx >> mdadm --add /dev/md0 /dev/sdx >> >> And we need the following instead: >> >> mdadm --fail /dev/md0 /dev/sdx >> mdadm --remove /dev/md0 /dev/sdx >> mdadm --stop /dev/md0 /dev/sdx >> mdadm --add /dev/md0 /dev/sdx >> mdadm --run /dev/md0 /dev/sdx >> >> Thanks, >> Song >> > > Song, I've tried the first procedure (without the --stop) and failed to > make it work on linear/raid0 arrays, even trying in vanilla kernel. > What I could do is: > > 1) Mount an array and while writing, remove a member (nvme1n1 in my > case); "mdadm --detail md0" will either show 'clean' state or 'broken' > if we have my patch; > > 2) Unmount the array and run: "mdadm -If nvme1n1 --path > pci-0000:00:08.0-nvme-1" > This will result: "mdadm: set device faulty failed for nvme1n1: Device > or resource busy" > Despite the error, md0 device is gone. > > 3) echo 1 > /sys/bus/pci/rescan [nvme1 device is back] > > 4) mdadm -A --scan [md0 is back, with both devices and 'clean' state] > > So, either if we "--stop" or if we incremental fail a member of the > array, when it's back the state will be 'clean' and not 'broken'. > Hence, I don't see a point in clearing the MD_BROKEN flag for > raid0/linear arrays, nor I see where we could do it. I think this makes sense. Please send the patch and we can discuss further while looking at the code. Thanks, Song