W dniu 6.07.2024 o 16:30, Mateusz Jończyk pisze: > Hello, > > Linux 6.9+ cannot start a degraded RAID1 array when the only remaining > device has the write-mostly flag set. Linux 6.8.0 works fine, as does > 6.1.96. [snip] > After some investigation, I have determined that the bug is most likely in > choose_slow_rdev() in drivers/md/raid1.c, which doesn't set max_sectors > before returning early. A test patch (below) seems to fix this issue (Linux > boots and appears to be working correctly with it, but I didn't do any more > advanced experiments yet). > > This points to > commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") > as the most likely culprit. However, I was running into other bugs in mdadm when > trying to test this commit directly. > > Distribution: Ubuntu 20.04, hardware: a HP 17-by0001nw laptop. I have been testing this patch carefully: 1. I have been reliably getting deadlocks when adding / removing devices on an array that contains a component with the write-mostly flag set - while the array was loaded with fsstress. When the array was idle, no such deadlocks happened. This occurred also on Linux 6.8.0 though, but not on 6.1.97-rc1, so this is likely an independent regression. 2. When adding a device to the array (/dev/sda1), I once got the following warnings in dmesg on patched 6.10-rc6: [ 8253.337816] md: could not open device unknown-block(8,1). [ 8253.337832] md: md_import_device returned -16 [ 8253.338152] md: could not open device unknown-block(8,1). [ 8253.338169] md: md_import_device returned -16 [ 8253.674751] md: recovery of RAID array md2 (/dev/sda1 has device major/minor numbers = 8,1). This may be caused by some interaction with udev, though. I have also seen this on Linux 6.8. Additionally, on an unpatched 6.1.97-rc1 (which was handy for testing), I got a deadlock when removing a bitmap from such an array while it was loaded with fsstress. I'll file independent reports, but wanted to give a head's up. Greetings, Mateusz