The patch 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") has the following regressions or bugs that this patch set fixes. 1. It can return cmds to upper layers like dm-multipath where that can retry them. After they are successful the fs/app can send new IO to the same sectors, but we've left the cmds running in FW or in the net layer. We need to be calling ep_disconnect. 2. The drivers that implement ep_disconnect expect that it's called before conn_stop. Besides crashes, if the cleanup_task callout is called before ep_disconnect it might free up driver/card resources for session1 then they could be allocated for session2. But because the driver's ep_disconnect is not called it has not cleaned up the firmware so the card is still using the resources for the original cmd. 3. The system shutdown case does not work for the eh path. Passing stop_conn STOP_CONN_TERM will never block the session and start the recovery timer, because for that flag userspace will do the unbind and destroy events which would remove the devices and wake up and kill the eh. We should be using STOP_CONN_RECOVER. 4. The stop_conn_work_fn can run after userspace has done it's recovery and we are happily using the session. We will then end up with various bugs depending on what is going on at the time. We may also run stop_conn_work_fn late after userspace has called stop_conn and ep_disconnect and is now going to call start/bind conn. If stop_conn_work_fn runs after bind but before start, we would leave the conn in a unbound but sort of started state where IO might be allowed even though the drivers have been set in a state where they no longer expect IO. 5. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in kernel stop_conn function is breaking userspace. We should have been doing this for the caller. The patchset should also maintain support for the fix in 7e7cd796f277 ("scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim"). I'm not 100% sure about that though. This patchset allows us to do max_active conn cleanups in parallel (256 default and up to 512). We used to only do 1 at a time. I'm not sure if this will allow us to hit the issue described in that patch more easily or it will be better because we have a higher chance of cleaning up commands that can be failed over to another path and free up dirty memory. I'm still testing the patches, but wanted to get some feedback from the google and collabora devs that made the original patches. V2: - Handle second part of #4 above and fix missing locking - Include iscsi_tcp kernel sock shutdown patch