Hi, this is your Linux kernel regression tracker. On 13.09.22 04:36, Dusty Mabe wrote: > On 9/12/22 21:55, Ming Lei wrote: >> On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote: >>> On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote: >>>> On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote: >>>>> On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote: >>>>>> It is a bit hard to associate the above commit with reported issue. >>>>> >>>>> So the messages clearly are about something trying to open a device >>>>> that went away at the block layer, but somehow does not get removed >>>>> in time by udev (which seems to be a userspace bug in CoreOS). But >>>>> even with that we really should not hang. >>>> >>>> Xiao Ni provides one script[1] which can reproduce the issue more or less. >>> >>> I've run the reproduced 10000 times on current mainline, and while >>> it prints one of the autoloading messages per run, I've not actually >>> seen any kind of hang. >> >> I can't reproduce the hang too. > > I obviously can reproduce the issue with the test in our Fedora CoreOS > test suite. It's part of a framework (i.e. it's not simple some script > you can run) but it is very reproducible so one can add some instrumentation > to the kernel and feed it through a build/test cycle to see different > results or logs. > > I'm willing to share this with other people (maybe a screen share or > some written down instructions) if anyone would be interested. This thread looked stalled, or was there any progress in the past week? If not: Fedora apparently removed the patch in their kernels a while ago, as quite a few users where hitting it. What is preventing us from doing the same in mainline and 5.19.y until the issue can be resolved? The description of a09b314005f3 ("block: freeze the queue earlier in del_gendisk") doesn't sound like the change does something crucial that can't wait a bit. I might be totally wrong with that, but I think it's my duty to ask that question at this point. >> What I meant is that new raid disk can be added by mdadm after stopping >> the imsm container and raid disk with the autoloading messages printed, >> I understand this behavior isn't correct, but I am not familiar with >> raid enough. >> >> It might be related with the delay deleting gendisk from wq & md kobj >> release handler. >> >> During reboot, if mdadm does this stupid thing without stopping, the hang >> could be caused. >> >> I think the root cause is that why mdadm tries to open/add new raid bdev >> crazily during reboot. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.