On Thu, Jul 28, 2022 at 03:18:49PM -0700, Bart Van Assche wrote: > From: Ming Lei <ming.lei@xxxxxxxxxx> > > Fix the race conditions between SCSI LLD kernel module unloading and SCSI > device and target removal by making sure that SCSI hosts are destroyed after > all associated target and device objects have been freed. > > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Ming Lei <ming.lei@xxxxxxxxxx> > Cc: Mike Christie <michael.christie@xxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxx> > Cc: John Garry <john.garry@xxxxxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > [ bvanassche: Reworked Ming's patch and split it ] I know this has been reported before, but it is still seen in the upstream kernel, so: This patch results in a deadlock if a USB storage device is removed. [ 29.291148] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 29.300064] ci_hdrc ci_hdrc.1: remove, state 4 [ 29.300317] usb usb2: USB disconnect, device number 1 [ 29.305090] ci_hdrc ci_hdrc.1: USB bus 2 deregistered [ 29.307052] ci_hdrc ci_hdrc.0: remove, state 1 [ 29.307214] usb usb1: USB disconnect, device number 1 [ 29.307321] usb 1-1: USB disconnect, device number 2 [ 29.344575] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 29.345323] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 63.358569] INFO: task init:347 blocked for more than 30 seconds. [ 63.358928] Tainted: G W N 6.0.0-rc4-00017-gcec18aa4b63a #1 [ 63.359200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 63.359600] task:init state:D stack: 0 pid: 347 ppid: 1 flags:0x00000000 [ 63.360104] __schedule from schedule+0x60/0xbc [ 63.360368] schedule from scsi_remove_host+0x154/0x1c0 [ 63.360602] scsi_remove_host from usb_stor_disconnect+0x4c/0xac [ 63.360852] usb_stor_disconnect from usb_unbind_interface+0x74/0x268 [ 63.361100] usb_unbind_interface from device_release_driver_internal+0x1a0/0x22c [ 63.361383] device_release_driver_internal from bus_remove_device+0xcc/0xfc [ 63.361651] bus_remove_device from device_del+0x16c/0x3f8 [ 63.361877] device_del from usb_disable_device+0xcc/0x178 [ 63.362097] usb_disable_device from usb_disconnect+0xd0/0x230 [ 63.362325] usb_disconnect from usb_disconnect+0x9c/0x230 [ 63.362536] usb_disconnect from usb_remove_hcd+0xd0/0x16c [ 63.362741] usb_remove_hcd from host_stop+0x38/0xa8 [ 63.362946] host_stop from ci_hdrc_remove+0x44/0x120 [ 63.363148] ci_hdrc_remove from platform_remove+0x20/0x4c [ 63.363367] platform_remove from device_release_driver_internal+0x1a0/0x22c [ 63.363635] device_release_driver_internal from bus_remove_device+0xcc/0xfc [ 63.363897] bus_remove_device from device_del+0x16c/0x3f8 [ 63.364117] device_del from platform_device_del.part.0+0x10/0x74 [ 63.364353] platform_device_del.part.0 from platform_device_unregister+0x18/0x24 [ 63.364623] platform_device_unregister from ci_hdrc_remove_device+0xc/0x20 [ 63.364886] ci_hdrc_remove_device from ci_hdrc_imx_remove+0x28/0x110 [ 63.365131] ci_hdrc_imx_remove from device_shutdown+0x174/0x250 [ 63.365372] device_shutdown from __do_sys_reboot+0x124/0x270 [ 63.365616] __do_sys_reboot from ret_fast_syscall+0x0/0x1c [ 63.365849] Exception stack(0xd1859fa8 to 0xd1859ff0) [ 63.366054] 9fa0: 01234567 000c623f fee1dead 28121969 01234567 00000000 [ 63.366343] 9fc0: 01234567 000c623f 00000001 00000058 000d85c0 00000000 00000000 00000000 [ 63.366620] 9fe0: 000d8298 bef49de4 000918bc b6e8cedc [ 63.366881] INFO: lockdep is turned off. [ 63.367069] Kernel panic - not syncing: hung_task: blocked tasks I understand that it looks like the problem is caused by the shutdown function in the imx driver calling remove_device, but that is not really the problem. As can be seen in the backtrace, usb_stor_disconnect() calls scsi_remove_host(). Thanks to this patch, scsi_remove_host() now waits for the scsi release function to be called. However, usb_stor_disconnect() only calls release_everything() and with it scsi_host_put() _after_ scsi_remove_host() has returned. Since scsi_remove_host() now waits for the resource which is released by calling scsi_host_put(), this causes a deadlock. If my analysis is correct, any USB storage device removal should result in the deadlock. My analysis may of course be wrong. If so, please let me know what I missed. Thanks, Guenter