Re: [PATCH] scsi: libiscsi: Allow sd_shutdown on bad transport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lee, Chris,

Some test results.

- Single unmounted disk, with transport connection wiped before final logout:

http://pastebin.ubuntu.com/26139576/

- Multiple mounted disks, multipath dev-mapper, all transport connections were wiped before the final logout, with heavy write workload:

http://pastebin.ubuntu.com/26139620/

Considering sd_shutdown logic - sd_shutdown, sd_sync_cache for each scsi_disk, 3 attempts of scsi_execute with SYNCHRONIZE_CACHE cmd each -  you can see that, because transport was down, first SYNC_CACHE cmd waits for the request timeout and for the abort_timeout. All other cmds fail in the enqueuing phase, because of the transport failure + previous timeout + server shutdown happening simultaneously, so you don't have to wait for timeout on each command again.

This change also suits any pending requests, not only those coming from sd_shutdown, and it allows OS to reboot and shutdown, back again, independently of how bad userland was configured.

Thank you in advance for considering it.

-Rafael

> On 07/12/2017, at 07:59 PM, Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx> wrote:
> 
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually,
> without logging out on server shutdown, or when automated scripts
> can't umount/logout from logged LUNs - kernel will hang forever on
> its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd
> to all still existent paths.
> 
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
> #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
> #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
> #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
> #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
> #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
> #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
> #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
> #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
> #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
> #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
> 
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
> 
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also
> need the session state to be logged in again.
> 
> Best option, for such case, is to tell upper layers that the command
> was handled during the transport layer error handler helper, marking
> it as DID_NO_CONNECT, which will allow completion and inform about
> the problem.
> 
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will
> fail to be queued, allowing upper logic to fail faster.
> 
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx>




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux