On Mon, Dec 20, 2021 at 4:45 PM Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > Hi Xiao, > > On Sun, 19 Dec 2021 11:20:59 +0800 > Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > Usage of error_handler causes that disk failure can be requested > > > from userspace. User can fail the array via #mdadm --set-faulty > > > command. This is not safe and will be fixed in mdadm. It is > > > correctable because failed state is not recorded in the metadata. > > > After next assembly array will be read-write again. For safety > > > reason is better to keep MD_BROKEN in runtime only. > > > > Hi Mariusz > > > > Let me call them chapter[1-4] > > > > Could you explain more about 'mdadm --set-faulty' part? I've read this > > patch. But I don't > > know the relationship between the patch and chapter4. > > > > In patch2, you write "As in previous commit, it causes that #mdadm > > --set-faulty is able to > > mark array as failed." I tried to run command `mdadm /dev/md0 -f > > /dev/sda`. md0 is a raid0. > > It can't remove sda from md0. > > Did you test kernel with my patchset applied? > > I've added chapter 4 because I'm aware of behavior change. > Now for r0, nothing happens when we are trying to write failure to > md/<disk>/state. > > After the change, drive is not remove too, but MD_BROKEN is set and > any new write will be rejected. The drive will be still visible > in array (I didn't change that). Should I add it to description? Thanks for the explanation. I understand now. But I still have one question. Now for a raid0, it can't remove one member disk from raid0. It returns EBUSY and the raid0 still can work well. It makes sense. Because all member disks are busy, the admin can't remove the member disk and mdadm gives a proper error. With this patch, it changes the situation. In raid0_error, it sets MD_BROKEN. In fact, it isn't broken. So is it really good to set MD_BROKEN here? In patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken'...), MD_BROKEN is introduced when the member disk disappears and the disk is really broken. For raid0/linear, the raid device can't work anymore. Best Regards Xiao