> From: Bart Van Assche <bvanassche@xxxxxxx> > Sent: Saturday, April 11, 2020 8:03 AM > > On 2020-04-10 23:01, Dexuan Cui wrote: > > Please let me know if the change to scsi_device_set_state() is OK. > > Hadn't Ming Lei already root-caused this issue for you? From his e-mail: > "So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT > request is still to be handled." > > Please follow that advice. > > Bart. Hi Bart, Ming, I agree Ming has root-caused the issue, but it looks the advice can not apply to the hibernation scenario. :-) Sorry for my lack of knowledge of the complex SCSI subsystems -- could you please elaborate on what a low level SCSI device driver (like hv_storvsc) should do to safely save/restore the device state upon hibernation? The nature of "free related vmbus ringbuffer" in hv_storvsc is that: the driver can not handle any I/O after the device is quiesced in software_resume() -> load_image_and_restore() -> hibernation_restore() -> dpm_suspend_start() -> ... -> storvsc_suspend(). BTW, after the SCSI device is quiesced, the hibernation's resume path also quiesces other devices, disables non-boot CPUs, and finally jumps to the old kernel's entry point where the old kernel was suspended, and the old kernel will resume back. My intuition is that the upper level SCSI layer should provide an API to flush any pending I/O and block any new I/O after a SCSI device is "quiesced"? -- it looks scsi_host_block()/scsi_host_unblock() are such APIs, which are already used by drivers/scsi/aacraid/linit.c: aac_suspend()/aac_resume(). That's why I proposed the patch of the same thing for hv_storvsc, and it looks the patch works for me: without the patch I can easily hit the panic I reported in the first email; with the patch, I have successfully done more than 30 rounds of hibernation without the panic. However, it looks you implied my intuition is wrong and it's *expected* that the upper level SCSI layer can still submit I/O requests with the BLK_MQ_REQ_PREEMPT flag after the SCSI device is "quiesced"? If this is the case, then how is hv_storvsc supposed to handle the I/O after the SCSI device is quiesced? I can keep the related vmbus ringbuffer, but the real issue is: the driver is unable to handle any I/O at all since the vmbus connection to the Hyper-V host is disconnected soon, after the SCSI device is quiesced. Should hv_storvsc return an error for such I/O, or block such I/O until the SCSI device is resumed? -- These don't look good to me, and I really think the upper level SCSI layer should provide an API to block any new I/O after a SCSI device is "quiesced" -- again, can you please clarify if scsi_host_block()/scsi_host_unblock() are such APIs? Looking forward to your replies! Thanks, -- Dexuan