On Wed, Oct 04, 2017 at 03:33:01AM +0800, Ming Lei wrote: > On Tue, Oct 03, 2017 at 11:53:08AM -0700, Luis R. Rodriguez wrote: > > INFO: task kworker/u8:8:1320 blocked for more than 10 seconds. > > Tainted: G E 4.13.0-next-20170907+ #88 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > kworker/u8:8 D 0 1320 2 0x80000000 > > Workqueue: events_unbound async_run_entry_fn > > Call Trace: > > __schedule+0x2ec/0x7a0 > > schedule+0x36/0x80 > > io_schedule+0x16/0x40 > > get_request+0x278/0x780 > > ? remove_wait_queue+0x70/0x70 > > blk_get_request+0x9c/0x110 > > scsi_execute+0x7a/0x310 [scsi_mod] > > sd_sync_cache+0xa3/0x190 [sd_mod] > > ? blk_run_queue+0x3f/0x50 > > sd_suspend_common+0x7b/0x130 [sd_mod] > > ? scsi_print_result+0x270/0x270 [scsi_mod] > > sd_suspend_system+0x13/0x20 [sd_mod] > > do_scsi_suspend+0x1b/0x30 [scsi_mod] > > scsi_bus_suspend_common+0xb1/0xd0 [scsi_mod] > > ? device_for_each_child+0x69/0x90 > > scsi_bus_suspend+0x15/0x20 [scsi_mod] > > dpm_run_callback+0x56/0x140 > > ? scsi_bus_freeze+0x20/0x20 [scsi_mod] > > __device_suspend+0xf1/0x340 > > async_suspend+0x1f/0xa0 > > async_run_entry_fn+0x38/0x160 > > process_one_work+0x191/0x380 > > worker_thread+0x4e/0x3c0 > > kthread+0x109/0x140 > > ? process_one_work+0x380/0x380 > > ? kthread_create_on_node+0x70/0x70 > > ret_from_fork+0x25/0x30 > > Actually we are trying to fix this issue inside block layer/SCSI, please > see the following link: > > https://marc.info/?l=linux-scsi&m=150703947029304&w=2 > > Even though this patch can make kthread to not do I/O during > suspend/resume, the SCSI quiesce still can cause similar issue > in other case, like when sending SCSI domain validation > to transport_spi, which happens in revalidate path, nothing > to do with suspend/resume. Are you saying that the SCSI layer can generate IO even without the filesystem triggering it? If so then by all means these are certainly other areas we should address quiescing as I noted in my email. Also, *iff* the generated IO is triggered on the SCSI suspend callback, then clearly the next question is if this is truly needed. If so then yes, it should be quiesced and all restrictions should be considered. Note that device pm ops get called first, then later the notifiers are processed, and only later is userspace frozen. Its this gap this patch set addresses, and its also where where I saw the issue creep in. Depending on the questions above we may or not need more work in other layers. So I am not saying this patch set is sufficient to address all IO quiescing, quite the contrary I acknowledged that each subsystem should vet if they have non-FS generated IO (seems you and Bart are doing great job at doing this analysis on SCSI). This patchset however should help with odd corner cases which *are* triggered by the FS and the spaghetti code requirements of the kthread freezing clearly does not suffice. > So IMO the root cause is in SCSI's quiesce. > > You can find the similar description in above link: > > Once SCSI device is put into QUIESCE, no new request except for > RQF_PREEMPT can be dispatched to SCSI successfully, and > scsi_device_quiesce() just simply waits for completion of I/Os > dispatched to SCSI stack. It isn't enough at all. I see so the race here is *on* the pm ops of SCSI we have generated IO to QUIESCE. > > Because new request still can be coming, but all the allocated > requests can't be dispatched successfully, so request pool can be > consumed up easily. Then RQF_PREEMPT can't be allocated, and > hang forever, just like the stack trace you posted. > I see. Makes sense. So SCSI quiesce has restrictions and they're being violated. Anyway, don't think of this as a replacement for yours or Bart's work then, but rather supplemental. Are you saying we should not move forward with this patch set, or simply that the above splat is rather properly fixed with SCSI quiescing? Given you're explanation I'd have to agree. But even with this considered and accepted, from a theoretical perspective -- why would this patch set actually seem to fix the same issue? Is it, that it just *seems* to fix it? Luis