> From: Bart Van Assche <bvanassche@xxxxxxx> > Sent: Thursday, April 23, 2020 9:38 AM > > On 4/23/20 12:04 AM, Dexuan Cui wrote: > > It looks the sd suspend callbacks are only for the I/O from the disk, e.g. > > from the file system that lives in some partition of some disk. > > > > The panic I'm seeing is not from sd. I think it's from a kernel thread > > that tries to detect the status of the SCSI CDROM. This is the snipped > > messages (the full version is at ...): here > > the suspend callbacks of the sd, sr and scsi_bus_type.pm have been called, > > and later the storvsc LLD's suspend callback is also called, but > > sr_block_check_events() can still try to submit SCSI commands to storvsc: > > > > [ 11.668741] sr 0:0:0:1: bus quiesce > > [ 11.668804] sd 0:0:0:0: bus quiesce > > [ 11.698082] scsi target0:0:0: bus quiesce > > [ 11.703296] scsi host0: bus quiesce > > [ 11.781730] hv_storvsc bf78936f-7d8f-45ce-ab03-6c341452e55d: noirq > bus quiesce > > [ 11.796479] hv_netvsc dda5a2be-b8b8-4237-b330-be8a516a72c0: noirq > bus quiesce > > [ 11.804042] BUG: kernel NULL pointer dereference, address: > 0000000000000090 > > [ 11.804996] Workqueue: events_freezable_power_ disk_events_workfn > > [ 11.804996] RIP: 0010:storvsc_queuecommand+0x261/0x714 > [hv_storvsc] > > [ 11.804996] Call Trace: > > [ 11.804996] scsi_queue_rq+0x593/0xa10 > > [ 11.804996] blk_mq_dispatch_rq_list+0x8d/0x510 > > [ 11.804996] blk_mq_sched_dispatch_requests+0xed/0x170 > > [ 11.804996] __blk_mq_run_hw_queue+0x55/0x110 > > [ 11.804996] __blk_mq_delay_run_hw_queue+0x141/0x160 > > [ 11.804996] blk_mq_sched_insert_request+0xc3/0x170 > > [ 11.804996] blk_execute_rq+0x4b/0xa0 > > [ 11.804996] __scsi_execute+0xeb/0x250 > > [ 11.804996] sr_check_events+0x9f/0x270 [sr_mod] > > [ 11.804996] cdrom_check_events+0x1a/0x30 [cdrom] > > [ 11.804996] sr_block_check_events+0xcc/0x110 [sr_mod] > > [ 11.804996] disk_check_events+0x68/0x160 > > [ 11.804996] process_one_work+0x20c/0x3d0 > > [ 11.804996] worker_thread+0x2d/0x3e0 > > [ 11.804996] kthread+0x10c/0x130 > > [ 11.804996] ret_from_fork+0x35/0x40 > > > > It looks the issue is: scsi_bus_freeze() -> ... -> scsi_dev_type_suspend -> > > scsi_device_quiesce() does not guarantee the device is totally quiescent: > > During hibernation processes are frozen before devices are quiesced. > freeze_processes() calls try_to_freeze_tasks() and that function in turn > calls freeze_workqueues_begin() and freeze_workqueues_busy(). > freeze_workqueues_busy() freezes all freezable workqueues including > system_freezable_power_efficient_wq, the workqueue from which > check_events functions are called. Some time after freezable workqueues > are frozen dpm_suspend(PMSG_FREEZE) is called. That last call triggers > the pm_ops.freeze callbacks, including the pm_ops.freeze callbacks > defined in the SCSI core. > > The above trace seems to indicate that freezing workqueues has not > happened before devices were frozen. How about doing the following to > retrieve more information about what is going on? > * Enable CONFIG_PM_DEBUG in the kernel configuration. > * Run echo 1 > /sys/power/pm_print_times and echo 1 > > /sys/power/pm_debug_messages before hibernation starts. > > Bart. Good point! My panic happens in the "restore" path, not the "save" path. In the "save" path, as you described, it looks everything is done correctly. BTW, freeze_processes() only freezes the userspace processes. After hibernate() -> freeze_processes(), hibernate() -> hibernation_snapshot() -> freeze_kernel_threads() also freezes the "freezable" kernel processes, and then we call dpm_suspend(PMSG_FREEZE), and next we do a lot of other things, and finally the system is powered off. In the "restore" path of the hibernation: 1. We start the system and a fresh new kernel starts to run. 2. The 'new' kernel itself finishes the initialization and the 'init' script in the initramfs starts to run. The 'init' script notices there is a saved state of the previous 'old' kernel in some swap file/partition, so it won't do a usual start-up; instead, the 'init' script runs a program called 'resume'. 3. The 'resume' program talks to the kernel via /sys/power/resume and asks the kernel to do a resume-from-disk. 4. The kernel starts to run resume_store() -> software_resume -> freeze_processes(), which freezes the userspace, but the "freezable" kernel threads are not frozen!!! So it looks the below patch also works for me: --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -898,6 +898,11 @@ static int software_resume(void) error = freeze_processes(); if (error) goto Close_Finish; + + error = freeze_kernel_threads(); + if (error) + goto Close_Finish; + error = load_image_and_restore(); thaw_processes(); Finish: Just to be sure, I'll do more tests, but I believe the panic can be fixed by this according to my tests I have done so far. I'm still not sure what the comment before scsi_device_quiesce() means: * ... Since special requests may also be requeued requests, * a successful return doesn't guarantee the device will be * totally quiescent. I don't know if there can be some other I/O submitted after scsi_device_quiesce() returns in the case of hibernation, and I don't know if aac_suspend() -> scsi_host_block() should be fixed/removed, but as far as the panic is concerned, I'm very glad I have found a better fix with your help. Thank you so much, Bart! Thanks, -- Dexuan