Hello,
I'm back with another regression found in SLES15SP6.
The scenario is as follows:
1.Create RAID 1 volume with native metadata.
# mdadm -CR /dev/md126 -l1 -n2 /dev/nvme[0-1]n1 --assume-clean --size=5G
2. Create partition and filesystem on raid volume.
# parted -a optimal /dev/md126 mktable gpt mkpart primary ext4 0% 100% -s
# mkfs.ext4 -F /dev/md126p1
3. Remove device via "--incremental --fail".
# mdadm -If nvme0n1
Result:
Mdadm hangs and hung task info from mutliple components starts appearing on serial.
Few notes:
* Issue does not reproduce without creating partition and filesystem.
* If array is stopped and reassembled before step 3, the issue does not reproduce.
* If partition is "reused" (metadata was cleared, new raid volume created, partition left in tact,
no recreating partition) the issue does not reproduce.
* If "--set-faulty" and then "--remove" used (instead of "--incremental --fail") "--set-faulty"
succeeds, "--remove" hangs.
* I verified this is not mdadm issue by installing mdadm-4.2 (SLES15SP6 has mdadm-4.3 inbox) and
rerunning the test. Outcome is the same.
* Writing "remove" to sysfs directly has same result.
Thanks,
Mateusz