Hi, I found two hangup problems between iscsid service and iscsi module. And I can reproduce one of them in the latest kernel always. So I think the problems really exist. It really took me a long time to find out why due to my lack of knowledge of iscsi. But I cannot find a good way to solve them both. Please do help to take a look at them. Thx. ========= Problem 1: *************** [What it looks like] *************** First, we connect to 10 remote LUNs with iscsid service with at least two dirrerent sessions. When network error occurs, the session could be in error. If we do login and logout, iscsid service could run into D state. My colleague has posted an email to report this problem before. And he posted a long call trace. But barely gain any feedback. (https://lkml.org/lkml/2017/6/19/330) ************** [Why it happens] ************** In the latest kernel, asynchronous part of sd_probe() was executed in scsi_sd_probe_domain, and sd_remove() would wait until all the works in scsi_sd_probe_domain finished. When we use iscsi based remote storage, and the network is broken, the following deadlock could happen. 1. An iscsi session login is in progress, and calls sd_probe() to probe a remote lun. The synchronous part has finished, and the asynchronous part is scheduled in scsi_sd_probe_domain, and will submit io to execute scsi cmd to obtain device info. When the network is broken, the session will go into ISCSI_SESSION_FAILED state, and the io will retry until the session becomes ISCSI_SESSION_FREE. As a result, the work in scsi_sd_probe_domain hangs. 2. On the other hand, iscsi kernel module detects network ping timeout, and triggers ISCSI_KEVENT_CONN_ERROR event. iscsid in user space will handle this event by triggering ISCSI_UEVENT_DESTROY_SESSION event. Destroy session process is synchronous, and when it calls sd_remove() to remove the lun, it waits until all the works in scsi_sd_probe_domain finish. As a result, it hangs, and iscsid in user space goes into D state which is not killable, and not able to handle all the other events. **************** [How to reproduce] **************** With the script below, I can reproduce it in the latest kernel always. # create network errors tc qdisc add dev eth1 root netem loss 60% while [1] do iscsiadm -m node -T xxxxxx -login sleep 5 iscsiadm -m node -T xxxxxx -logout & iscsiadm -m node -T yyyyyy -login & done xxxxxx and yyyyyy are two different target names. Connect to about 10 remote LUNs, and run the script for about half an hour will reproduce the problem. ******************* [How I avoid it for now] ******************* To avoid this problem, I simply remove scsi_sd_probe_domain, and call sd_probe_async() synchronously in sd_probe(). So sd_remove() doesn't need to wait for the domain again. @@ -2986,7 +2986,40 @@ static int sd_probe(struct device *dev) get_device(&sdkp->dev); /* prevent release before async_schedule */ - async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain); + sd_probe_async((void *)sdkp, 0); I know this is not a good way, so would you please give some advice about it ? ========= Problem 2: *************** [What it looks like] *************** When remove a scsi device, and the network error happens, __blk_drain_queue() could hang forever. # cat /proc/19160/stack [<ffffffff8005886d>] msleep+0x1d/0x30 [<ffffffff80201a84>] __blk_drain_queue+0xe4/0x160 [<ffffffff80202766>] blk_cleanup_queue+0x106/0x2e0 [<ffffffffa000fb02>] __scsi_remove_device+0x52/0xc0 [scsi_mod] [<ffffffffa000fb9b>] scsi_remove_device+0x2b/0x40 [scsi_mod] [<ffffffffa000fbc0>] sdev_store_delete_callback+0x10/0x20 [scsi_mod] [<ffffffff801a4e75>] sysfs_schedule_callback_work+0x15/0x80 [<ffffffff80062d69>] process_one_work+0x169/0x340 [<ffffffff800667e3>] worker_thread+0x183/0x490 [<ffffffff8006a526>] kthread+0x96/0xa0 [<ffffffff8041ebb4>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff The request queue of this device was stopped. So the following check will be true forever: __blk_run_queue() { if (unlikely(blk_queue_stopped(q))) return; __blk_run_queue_uncond(q); } So __blk_run_queue_uncond() will never be called, and the process hang. ************** [Why it happens] ************** When the network error happens, iscsi kernel module detected the ping timeout and tried to recover the session. Here, the queue was stopped, or you can also say session was blocked. iscsi_start_session_recovery(session, conn, flag); |-> iscsi_block_session(session->cls_session); |-> blk_stop_queue(q) The session should be unblocked if the session is recovered or the recovery times out. But it was not unblocked properly because scsi_remove_device() deleted the the device first, and then called __blk_drain_queue(). __scsi_remove_device() |-> device_del(dev) |-> blk_cleanup_queue() |-> scsi_request_fn() |-> __blk_drain_queue() At this time, the device was not on the children list of the parent device. So when __iscsi_unblock_session() tried to unblock the parent device and its children, the removed device could not be unblocked. And its queue was stopped forever. __iscsi_unblock_session() |-> scsi_target_unblock() |-> device_for_each_child() **************** [How to reproduce] **************** Unfortunately I cannot reproduce it in the latest kernel. The script below will help to reproduce, but not very often. # create network error tc qdisc add dev eth1 root netem loss 60% # restart iscsid and rescan scsi bus again and again while [ 1 ] do systemctl restart iscsid rescan-scsi-bus (http://manpages.ubuntu.com/manpages/trusty/man8/rescan-scsi-bus.8.html) done ************** [How I resolve it] ************** For now, I resolve this problem by checking QUEUE_FLAG_DYING flag in __blk_run_queue(). blk_cleanup_queue() will set QUEUE_FLAG_DYING, and then call __blk_drain_queue(). At this time, __scsi_remove_device() should have already set scsi_device status to SDEV_DEL. So if the quese is dying, no matter if the quese is stopped, we goto __blk_run_queue_uncond(), and then scsi_request_fn() will kill the rest requests. --- void __blk_run_queue(struct request_queue *q) { - if (unlikely(blk_queue_stopped(q))) + if (unlikely(blk_queue_stopped(q)) && unlikely(!blk_queue_dying(q))) return; __blk_run_queue_uncond(q); -- Thanks