On Wed, 22 Mar 2023 10:24:41 +0800 Wu Guanghao <wuguanghao3@xxxxxxxxxx> wrote: > 在 2023/3/21 18:18, Mariusz Tkaczyk 写道: > > On Tue, 21 Mar 2023 16:56:37 +0800 > > Wu Guanghao <wuguanghao3@xxxxxxxxxx> wrote: > > > >> The latest kernel version will not report an error through mdadm > >> set_disk_faulty. > >> > >> $ lsblk > >> sdb 8:16 0 10G 0 disk > >> └─md0 9:0 0 19.9G 0 raid0 > >> sdc 8:32 0 10G 0 disk > >> └─md0 9:0 0 19.9G 0 raid0 > >> > >> old kernel: > >> ... > >> $ mdadm /dev/md0 -f /dev/sdb > >> mdadm: set device faulty failed for /dev/sdb: Device or resource busy > >> ... > >> > >> latest kernel: > >> ... > >> $ mdadm /dev/md0 -f /dev/sdb > >> mdadm: set /dev/sdb faulty in /dev/md0 > >> ... > >> > >> The old kernel judges whether the Faulty flag is set in rdev->flags, > >> and returns -EBUSY if not. And The latest kernel only return -EBUSY > >> if the MD_BROKEN flag is set in mddev->flags. raid0 doesn't set > >> error_handler, so MD_BROKEN will not be set, it will return 0. > >> > >> So if error_handler isn't set for a raid type, also return -EBUSY. > > Hi, > > Please test with: > > https://lore.kernel.org/linux-raid/20230306130317.3418-1-mariusz.tkaczyk@xxxxxxxxxxxxxxx/ > > > > Thanks, > > Mariusz > > > > Hi, Mariusz > > Are there other patches? There are other problems with this patch. > https://lore.kernel.org/linux-raid/20230306130317.3418-1-mariusz.tkaczyk@xxxxxxxxxxxxxxx/ > > md_submit_bio() > ... > // raid0 set disk faulty failed, but MD_BROKEN flag is set, > // write IO will fail. > if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) { > bio_io_error(bio); > return; > } > ... > > old kernel: > ... > $ mdadm /dev/md0 -f /dev/sdb > mdadm: set device faulty failed for /dev/sdb: Device or resource busy > > $ mkfs.xfs /dev/md0 > log stripe unit (524288 bytes) is too large (maximum is 256KiB) > log stripe unit adjusted to 32KiB > meta-data=/dev/md0 isize=512 agcount=16, agsize=1800064 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 bigtime=0 inobtcount=0 > data = bsize=4096 blocks=28801024, imaxpct=25 > = sunit=128 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=14064, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > Discarding blocks...Done. > ... > > > merged patch kernel: > ... > # mdadm /dev/md0 -f /dev/sdb > mdadm: set device faulty failed for /dev/sdb: Device or resource busy > > mkfs.xfs /dev/md0 > log stripe unit (524288 bytes) is too large (maximum is 256KiB) > log stripe unit adjusted to 32KiB > meta-data=/dev/md0 isize=512 agcount=8, agsize=65408 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 bigtime=0 inobtcount=0 > data = bsize=4096 blocks=523264, imaxpct=25 > = sunit=128 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=2560, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > mkfs.xfs: pwrite failed: Input/output error > ... > > Hi Wu, Beside the kernel, there are also patches in mdadm. Please check if you have them all. https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=b3e7b7eb1dfedd7cbd9a3800e884941f67d94c96 https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=461fae7e7809670d286cc19aac5bfa861c29f93a https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=fc6fd4063769f4194c3fb8f77b32b2819e140fb9 Some background: --faulty (-f) is intended to be used by administrators. We cannot rely on kernel answer because if mdadm will try to set device faulty, it results in MD_BROKEN and every new IO will be failed (and that is intended change). Simply, mdadm must check first if it can remove the drive and that was added by the mentioned patches. The first patch (the last one) added verification but brings regression, the next two patches are fixes for omitted scenarios. Thanks, Mariusz