> From: Bart Van Assche <bvanassche@xxxxxxx> > Sent: Thursday, April 23, 2020 4:25 PM > On 2020-04-23 11:29, Dexuan Cui wrote: > > So it looks the below patch also works for me: > > > > --- a/kernel/power/hibernate.c > > +++ b/kernel/power/hibernate.c > > @@ -898,6 +898,11 @@ static int software_resume(void) > > error = freeze_processes(); > > if (error) > > goto Close_Finish; > > + > > + error = freeze_kernel_threads(); > > + if (error) > > + goto Close_Finish; > > + > > error = load_image_and_restore(); > > thaw_processes(); > > Finish: > > > > Just to be sure, I'll do more tests, but I believe the panic can be fixed > > by this according to my tests I have done so far. > > If a freeze_kernel_threads() call is added in software_resume(), should > a thaw_kernel_threads() call be added too? Good catch! Note: thaw_processes() thaws every frozen process, including both user space processes and kernel processes. In software_resume(): 1. If freeze_kernel_threads() fails, I should add a "thaw_processes(); " before "goto Close_Finish; " so that all the user space processes can be thawed. 2. If freeze_kernel_threads() succeeds, but load_image_and_restore() Fails, there is already a thaw_processes(). 3. If load_image_and_restore() succeeds, it won't return, and the execution will return from the 'old' kernel's hibernate() -> hibernation_snapshot() -> create_image() -> swsusp_arch_suspend(), and later hibernate() -> thaw_processes() will thaw every frozen process of the 'old' kernel. > Anyway, please Cc me if a patch for software_resume() is submitted. Sure. Will do. > > I'm still not sure what the comment before scsi_device_quiesce() means: > > * ... Since special requests may also be requeued requests, > > * a successful return doesn't guarantee the device will be > > * totally quiescent. > > > > I don't know if there can be some other I/O submitted after > > scsi_device_quiesce() returns in the case of hibernation, and I don't > > know if aac_suspend() -> scsi_host_block() should be fixed/removed, > > but as far as the panic is concerned, I'm very glad I have found a better > > fix with your help. > > The function blk_set_pm_only() increments the q->pm_only counter while > the blk_clear_pm_only() function decrements the q->pm_only counter. > If q->pm_only > 0, blk_queue_enter() only succeeds if the flag > BLK_MQ_REQ_PREEMPT is set in the second argument passed to that > function. blk_get_request() calls blk_queue_enter(). The result is that > while q->pm_only > 0 blk_get_request() only submits a request without > waiting if the BLK_MQ_REQ_PREEMPT flag is set in its second argument. > scsi_execute() sets the BLK_MQ_REQ_PREEMPT flag. In other words, > scsi_device_quiesce() blocks requests submitted by filesystems but still > allows SCSI commands submitted by the SCSI core to be executed. > "special" refers to requests with the BLK_MQ_REQ_PREEMPT flag set. > > Bart. Thanks for the detailed clarification! So it sounds like we're safe here, and I guess the scsi_host_block() in aac_suspend() should be removed to discourage people from trying to use scsi_host_block() in a .suspend() callback. :-) Thanks, -- Dexuan