Re: [PATCH 3/3] tcmu: remove cmd timeout code

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Sat, 18 Mar 2017 18:51:13 -0700

On Thu, 2017-03-09 at 02:53 -0600, Mike Christie wrote:
> On 03/08/2017 10:22 PM, Nicholas A. Bellinger wrote:
> > On Wed, 2017-03-08 at 20:16 -0800, Nicholas A. Bellinger wrote:
> >> On Wed, 2017-03-08 at 12:02 -0600, Mike Christie wrote:
> > 
> > <SNIP>
> > 
> >>>
> >>> My tcmu-runner plan was:
> >>>
> >>> 1. For hung commands, we are going to add callouts to the backend module
> >>> for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
> >>> to have the backend perform the TMF. target_core_user would then call
> >>> into tcmu-runner and that would call into the userspace handler.
> >>>
> >>
> >> Mmmm.  This would mean TMRs would function very different between TCMU
> >> and non TCMU backends..
> 
> Do we ever want support for that? I actually started implementing
> support so lio backends like iblock could call into the initiator
> driver's error handlers:
> 
> http://www.spinics.net/lists/linux-scsi/msg97064.html
> 

Propagating target LUN_RESETs into LLD driver code, specifically in
cases where the LLD has a higher host reset timeout would certainly be
useful to reduce the overall end-to-end host-reset recovery time.

That said, I don't think the various pros/cons of this approach have
ever been verbalized on the list, so I'd like to give it some more
thought to understand what the downsides (if any) are.

> For tcmu, your idea below works nicely. I will work on it. Thanks.
> 

Ok, I'll post an updated patch to make target_complete_cmd() handle
SAM_STAT_BUSY + SAM_STAT_QUEUE_FULL as a v4.12 item.

> >>
> >> We've always had the assumption that once a TMR is processed, code
> >> blocks until the backend completes outstanding se_cmd(s) in question
> >> without an explicit cancellation trigger.
> >>
> >> One approach I've taken for a out-of-tree make_request_fn() based bio
> >> driver is return bios with -EAGAIN if a bio takes more than say 3
> >> seconds to complete, in order to signal to IBLOCK to propagate up a
> >> SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL.  This ends up working quite well
> >> to keep initiators happy, and virtually eliminates spurious TMRs in host
> >> environments (like ESX for example) with a very low SCSI timeout. 
> >>
> >> Of course, this pushes alot of smarts into the driver below the backend,
> >> but in a distributed scale out backend where I/Os delays, et al. are a
> >> fact of life this is difficult to avoid.
> >>
> >> So that said, I'd prefer to have TCMU's user-space code return back I/Os
> >> that are expected to take a long time with SAM_STAT_BUSY or
> >> SAM_STAT_QUEUE_FULL (to reduce host queue_depth), instead of adding a
> >> explicit se_cmd cancellation mechanism in target-core into backend
> >> driver code.
> >>
> >> I'd be happy to post the target-core changes needed to do this, which
> >> where posted at one point but ended up not getting merged. 
> > 
> > Btw, here is the patch in question:
> > 
> > http://www.spinics.net/lists/target-devel/msg11835.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html