On Tue, Nov 08, 2022 at 09:16:53PM +0100, Lukas Wunner wrote: > On Tue, Nov 08, 2022 at 09:12:44AM -0700, Keith Busch wrote: > > On Mon, Nov 07, 2022 at 04:14:54PM -0500, James Puthukattukaran wrote: > > > > > > There is a path to disable the controller and that code ran but did > > > not help. I checked wit the nvme folks and Keith mentioned that there > > > might be an issue with the nvme queue management. Unfortunately, we > > > can't try newer kernels in the field. So, looking for a way to just > > > "shut off the device" when we have scenarios like this where we can't > > > untangle the mess. > > > > Well, I didn't request you try new kernels in the field. I asked if you > > could experiment with a newer one on a development machine to confirm if > > the bug was fixed by some of the significant changes in this path so > > that we could confirm a reason to port to stable. You're going to have > > to change your kernel to fix this observation, so it would be worth the > > effort to know if the changes being considered actually address the > > problem. > > Current mainline still contains this problematic sequence: > > nvme_reset_work() > nvme_wait_freeze() > blk_mq_freeze_queue_wait() > > So I'm inclined to believe that the issue still persists, but I agree Yeah, that sequence exists, but there are some subtle changes with how the workqueues account for unquiesceing hardware queues that can affect how a freeze can make forward progress. > I think nvme_reset_work() is overly optimistic that resetting the drive > succeeded. It just freezes and unfreezes the I/O queue without checking > for errors. I'm not sure what you mean. An nvme reset is a CC.EN 0->1 transition, and we definitely confirm that succeeds. If you're referring to the 1->0 transition, that has to happen after the initial freeze/quiesce steps, but whether or not that succeeds shouldn't be relevant to the rest of the sequence: we're about to disable the device at the PCI level. > In particular, nvme_wait_freeze() should call the _timeout variant of > blk_mq_freeze_queue_wait() and cope with failure of freezing. That would indicate we have a mismatched freeze depth or a unbalanced quiesce problem, so the timeout freeze would just mask the underlying issue.