On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote: > On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote: > > It is a bit hard to associate the above commit with reported issue. > > So the messages clearly are about something trying to open a device > that went away at the block layer, but somehow does not get removed > in time by udev (which seems to be a userspace bug in CoreOS). But > even with that we really should not hang. Xiao Ni provides one script[1] which can reproduce the issue more or less. - create raid #./imsm.sh imsm /dev/md/test 1 /dev/sda /dev/sdb #ls /dev/md/ [root@ktest-36 md]# ls -l /dev/md/ total 0 lrwxrwxrwx. 1 root root 8 Sep 9 08:10 imsm -> ../md127 lrwxrwxrwx. 1 root root 8 Sep 9 08:10 test -> ../md126 - destroy the two raid devices # mdadm --stop /dev/md/test /dev/md/imsm mdadm: stopped /dev/md/test mdadm: stopped /dev/md/imsm # lsblk ... md126 9:126 0 0B 0 md md127 9:127 0 0B 0 md md126 is actually added after it is deleted, and with the log of "block device autoloading is deprecated and will be removed.", and bcc stack trace shows that the device is added by mdadm. 08:20:03 456 456 kworker/6:2 del_gendisk disk b'md126' b'del_gendisk+0x1 [kernel]' b'md_kobj_release+0x34 [kernel]' b'kobject_put+0x87 [kernel]' b'process_one_work+0x1c4 [kernel]' b'worker_thread+0x4d [kernel]' b'kthread+0xe6 [kernel]' b'ret_from_fork+0x1f [kernel]' 08:20:03 2476 2476 mdadm device_add_disk disk b'md126' b'device_add_disk+0x1 [kernel]' b'md_alloc+0x3ba [kernel]' b'md_probe+0x25 [kernel]' b'blk_request_module+0x5f [kernel]' b'blkdev_get_no_open+0x5c [kernel]' b'blkdev_get_by_dev.part.0+0x1e [kernel]' b'blkdev_open+0x52 [kernel]' b'do_dentry_open+0x1ce [kernel]' b'path_openat+0xc43 [kernel]' b'do_filp_open+0xa1 [kernel]' b'do_sys_openat2+0x7c [kernel]' b'__x64_sys_openat+0x5c [kernel]' b'do_syscall_64+0x37 [kernel]' b'entry_SYSCALL_64_after_hwframe+0x63 [kernel]' Also the md device is delayed to remove by scheduling wq, and it is actually deleted in mddev's release handler: mddev_delayed_delete(): kobject_put(&mddev->kobj) ... md_kobj_release(): del_gendisk(mddev->gendisk); > > Now that fact that it did hang before and this now becomes reproducible > also makes me assume the change is not the root cause. It might still > be a good vehicle to fix the issue for real, but it really broadens > the scope. > [1] create one imsm raid1 ./imsm.sh imsm /dev/md/test 1 /dev/sda /dev/sdb #!/bin/bash export IMSM_NO_PLATFORM=1 export IMSM_DEVNAME_AS_SERIAL=1 echo "" echo "===========================================================" echo "./test.sh container raid devlist level devnum" echo "example: ./test.sh imsm /dev/md/test 1 /dev/loop0 /dev/loop1" echo "===========================================================" echo "" container=$1 raid=$2 level=$3 shift 3 dev_num=$# dev_list=$@ mdadm -CR $container -e imsm -n $dev_num $dev_list mdadm -CR $raid -l $level -n $dev_num $dev_list [2] destroy created raid devices mdadm --stop /dev/md/test /dev/md/imsm Thanks, Ming