Re: [PATCH 0/4] iscsi target: Fix oops during relogin

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 25 May 2017 22:49:07 -0700

Hey MNC,

Apologies for taking so long to get back to this one.  Comments below.

(Adding HARE, Sagi & Varun CC').

On Mon, 2017-04-03 at 16:44 -0500, Mike Christie wrote:
> On 03/31/2017 11:54 AM, Mike Christie wrote:
> > On 03/31/2017 01:00 AM, Nicholas A. Bellinger wrote:

<SNIP>

> >>
> >> What I'm confused by is the particular scenario described in the patch.
> >> That is, it's the same scenario DATERA Q/A and automation routinely
> >> tests on v4.1.y and v3.14.y with a few thousand active volumes.  So far
> >> we've not triggered a reproduction like the one described above.
> >>
> >> Namely, where a backend driver takes an extended amount of time to
> >> complete an outstanding se_cmd, resulting in ABORT_TASK and LUN_RESET,
> >> followed by a session reinstatement that occurs while se_cmd is still
> >> outstanding to backend driver code.
> >>
> >> If a session reinstatement fails due to it's login attempt taking longer
> >> than TA_LOGIN_TIMEOUT=15 seconds since the se_cmd in question still
> >> didn't complete, iscsi_handle_login_thread_timeout() fires and sends
> >> SIGINT to iscsi_np->np_thread.
> >>
> >> If iscsi_check_for_session_reinstatement() is already blocked on
> >> iscsi_stop_session() -> wait_for_completion(), it will wait indefinitely
> >> until the se_cmd in question is completed back to target-core before
> >> allowing login to make forward progress, or fail due to the login
> >> timeout.
> > 
> > This is where we hit the problem.
> > 
> > At this time while in the wait, the initiator gives up (normally hit a
> > iscsi login timeout on the initiator side) on the login attempt and just
> > drops the tcp/ip connection. On the target side we detect this and
> > iscsi_target_sk_state_change runs and iscsi_target_do_cleanup which
> > frees the iscsi_login related resources.
> > 
> > When the command eventually completes, we wake from the
> > wait_for_completion and try to access the freed iscsi_login struct.
> > 
> > The problem is that iscsi_check_for_session_reinstatement ->
> > iscsi_target_check_for_existing_instances will return 0 after the
> > command has completed so the login thread does not know that login has
> > failed due to the tcp/ip connection getting dropped and the iscsi_login
> > struct has been freed. It will then try to access the freed iscsi_login
> > struct and proceed with the login process.
> > 
> > 
> > 
> > 
> >>
> >> If iscsi_check_for_session_reinstatement hasn't been reached yet or
> >> hasn't blocked on wait_for_completion(), the SIGINT should fail the
> >> connection the next time it attempts to do socket I/O.
> >>
> >> From what I can gather from the original problem statement, you are
> >> hitting something different than these two cases, right..?
> >>
> >> So I'd really like to reproduce what you've seen to trigger the
> >> scenario, and jump into kgdb and see what's going on.  Would you mind
> >> giving me more details wrt you've been have to reproduce this, and even
> >> better, some debug code to reproduce at will..?
> >>
> > 
> > I will send a patch for scsi_debug that can simulate the problem.
> > 
> 
> Attached is a patch to scsi_debug, scsi-debug-hang-abort.patch, which
> will hang the abort process so you can simulate commands that get stuck.
> Just export the scsi_debug /dev/sdX as a pscsi backend device and use
> these settings for scsi_debug:
> 
> 1. every_nth = 30 (set this after the initial login through sysfs on the
> target side /sys/module/scsi_debug/parameters/every_nth, so you do not
> hit scanning related issues)
> 2. abort_sleep = 120 (you might need to increase this depending on your
> timeouts below)
> 3. opts = 0x4
> 
> On the initiator side, use these settings to speed up the failure:
> 
> 1. Set /sys/block/sdX/device/timeout to 5.
> 2. node.session.timeo.replacement_timeout = 5
> 3. node.conn[0].timeo.login_timeout = 30
> 4. node.conn[0].timeo.noop_out_timeout and
> node.conn[0].timeo.noop_out_interval = 5
> 5. node.session.err_timeo.abort_timeout = 5
> 
> On the target side, I am using the default settings.
> 
> Then just do some simple IO until you hit the every_nth limit. Do
> something like
> 
> dd if=/dev/sdX of=/dev/null iflag=direct count=1
> 
> A couple times until you hit the every_nth setting, so you do not end up
> with a lot of stuck IO on the target side.
> 
> I also attached the oops I see in the attachment iscsi-relogin-bug. It
> was made against master in target-pending.
> 
> 

OK, so I've finally posted a patch to address the root cause.

iscsi-target: Fix initial login PDU asynchronous socket close OOPs
https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?id=61d3d4c9abc1d22cfd444e46837bda090f1343ce

It addresses the original case where iscsi_np process context is blocked
waiting for session reinstatement to complete while processing the
initial login request PDU, but the TCP connection is closed
asynchronously by the initiator.

It just sets LOGIN_FLAGS_CLOSED in iscsi_target_sk_state_change(), and
let's iscsi_np process context detect the failure once session
reinstatement has completed, and perform the associated connection
cleanup for the failed login attempt.

For subsequent login request PDUs handled from delayed workqueue context
in iscsi_target_do_login_rx(), when a TCP connection is closed
asynchronously from iscsi_target_sk_state_change() just kicks
schedule_delayed_work(&conn->login_work, 0) and let's delayed workqueue
process context in iscsi_target_do_login_rx() handle the failure
detection and associated connection cleanup.

That said, I've been testing these particular two scenarios on v4.12-rc1
with the patch using settings you described above, and both cases are
working as expected in small scale vm-tests.

Also, the same patch has been backported to v4.1.y and the DATERA Q/A
team is currently putting it through the scale and error injection
regression test on physical hardware.

So please have a look (Hannes please review) and verify on your end it
addresses the original bug, and doesn't break anything else.

For Sagi + Varun, AFAICT this doesn't break anything wrt to iser-target
and cxgbit code, but if you can give it a quick spin on your setup it
would be appreciated.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html