On 9/12/22 21:55, Ming Lei wrote: > On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote: >> On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote: >>> On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote: >>>> On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote: >>>>> It is a bit hard to associate the above commit with reported issue. >>>> >>>> So the messages clearly are about something trying to open a device >>>> that went away at the block layer, but somehow does not get removed >>>> in time by udev (which seems to be a userspace bug in CoreOS). But >>>> even with that we really should not hang. >>> >>> Xiao Ni provides one script[1] which can reproduce the issue more or less. >> >> I've run the reproduced 10000 times on current mainline, and while >> it prints one of the autoloading messages per run, I've not actually >> seen any kind of hang. > > I can't reproduce the hang too. I obviously can reproduce the issue with the test in our Fedora CoreOS test suite. It's part of a framework (i.e. it's not simple some script you can run) but it is very reproducible so one can add some instrumentation to the kernel and feed it through a build/test cycle to see different results or logs. I'm willing to share this with other people (maybe a screen share or some written down instructions) if anyone would be interested. > > What I meant is that new raid disk can be added by mdadm after stopping > the imsm container and raid disk with the autoloading messages printed, > I understand this behavior isn't correct, but I am not familiar with > raid enough. > > It might be related with the delay deleting gendisk from wq & md kobj > release handler. > > During reboot, if mdadm does this stupid thing without stopping, the hang > could be caused. > > I think the root cause is that why mdadm tries to open/add new raid bdev > crazily during reboot. > Dusty