Hi, I'm trying to help progress a kernel regression hitting Fedora infrastructure in which dozens of VMs run concurrently to execute QA testing. The problem doesn't happen immediately, but all the VM's get stuck and then any new process also gets stuck, so extracting information from the system has been difficult and there's not a lot to go on, but this is what I've got so far. Systems (Fedora openQA worker hosts) on kernel 5.12.12+ wind up in a state where forking does not work correctly, breaking most things https://bugzilla.redhat.com/show_bug.cgi?id=2009585 In that bug some items of interest ... This megaraid_sas trace. The hang hasn't happened at this point though, so it may not be related at all or it might be an instigator. https://bugzilla.redhat.com/show_bug.cgi?id=2009585#c31 Once there is a hang, we have these traces from reducing the time for the kernel to report blocked tasks. Much of the messages I'm told from kvm/qemu folks are pretty ordinary/expected locks. But the XFS portions might give a clue what's going on? 5.15-rc7 https://bugzilla-attachments.redhat.com/attachment.cgi?id=1840941 5.15+ https://bugzilla-attachments.redhat.com/attachment.cgi?id=1840939 So I can imagine the VM's are stuck because XFS is stuck. And XFS is stuck because something in the block layer or megaraid driver is stuck, but I don't know that for certain. Thanks, -- Chris Murphy