On Tue, 2 Jul 2024 16:57:38 +0200 Mateusz Kusiak <mateusz.kusiak@xxxxxxxxxxxxxxx> wrote: > Hello, > I'm back with another regression found in SLES15SP6. > > The scenario is as follows: > 1.Create RAID 1 volume with native metadata. > # mdadm -CR /dev/md126 -l1 -n2 /dev/nvme[0-1]n1 --assume-clean --size=5G > > 2. Create partition and filesystem on raid volume. > # parted -a optimal /dev/md126 mktable gpt mkpart primary ext4 0% 100% -s > # mkfs.ext4 -F /dev/md126p1 > > 3. Remove device via "--incremental --fail". > # mdadm -If nvme0n1 > > Result: > Mdadm hangs and hung task info from mutliple components starts appearing on > serial. > > Few notes: > * Issue does not reproduce without creating partition and filesystem. > * If array is stopped and reassembled before step 3, the issue does not > reproduce. > * If partition is "reused" (metadata was cleared, new raid volume created, > partition left in tact, no recreating partition) the issue does not reproduce. > * If "--set-faulty" and then "--remove" used (instead of "--incremental > --fail") "--set-faulty" succeeds, "--remove" hangs. > * I verified this is not mdadm issue by installing mdadm-4.2 (SLES15SP6 has > mdadm-4.3 inbox) and rerunning the test. Outcome is the same. > * Writing "remove" to sysfs directly has same result. > > Thanks, > Mateusz > More info: As Mateusz said echo "remove" >/sys/block/md126/md/rd0/state hangs. Same hang is observed with HOT_REMOVE_DISK ioctl. We can simulate the scenario by: echo "faulty" >/sys/block/md126/md/rd0/state echo "remove" >/sys/block/md126/md/rd0/state This is really interesting that it is only happening with partitions and only after their creation. Mariusz