On Thu, 18 Jul 2024 16:57:03 +0200 Mateusz Kusiak <mateusz.kusiak@xxxxxxxxxxxxxxx> wrote: > Hello, > recently we discovered an issue regarding drive removal during I/O. > > Description: > Drive removed during I/O from IMSM R1D2 array is being set to faulty > but is not removed from a volume. I/O on the array hangs. > > The scenario is as follows: > 1. Create R1D2 IMSM array. > 2. Create single partition, format it as ext4 and mount is somewhere. > 3. Start multiple checksum tests processes (more on that below) and > wait a while. 4. Unplug one RAID member. > Thanks Mateusz, can you confirm if this is only with imsm metadata? In other words with native metadata is this an issue or not? -Paul > About "Checksum test": > Checksum test creates ~3GB file and calculates it's checksum twice. > It basically does the following: # dd if=/proc/kcore bs=1024 > count=3052871 status=none | tee <filename> | md5sum ...and then > recalculates checksum to verify if it matches. In this scenario we > use it to simulate I/O, by running multiple tests. > > Expected result: > Raid member is removed from the volume and the container, array > continues operation on one drive. > > Actual result: > Raid member is set to faulty on volume and does not disappear (it's > not removed), but it is removed from a container. I\O on mounted > volume hangs. > > Additional notes: > The issue reproduces on kernel-next. We bisected that potential cause > of the issue might be patch "md: use new apis to suspend array for > adding/removing rdev from state_store()" > (cfa078c8b80d0daf8f2fd4a2ab8e26fa8c33bca1) as it's the first one we > observe the issue on our reproduction setup. > > Having said that, we also observed the issue for example on SLES15SP6 > with kernel 6.4.0-150600.10-default, which might indicate that the > problem was here, but became apparent for some reason (race-condition > or something else). > > I will work on simplifying the scenario and try to provide script for > reproduction. > > Thank, > Mateusz >