Re: [PATCH 1/2] tcmu: add qfull timeout for ring space waits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/11/2017 12:43 AM, Mike Christie wrote:
> 30 secs (default timeout) is too long to wait for memory.
> For ESX we would have seen several aborts and linux will
> just be starting to send an abort. We would like to fail
> these commands with TASK SET FULL so the initaitor knows
> it is sendng too many commands. We do not want to use
> the same time out value as cmd_time_out because we may not
> want that set if userspace has its own timers for commands
> that have got ring space and made it to the userspace
> daemon.
> 
> This adds a new tcmu timeout for this case, qfull_time-out
> which controls how long we wait for ring space.
> 
> Signed-off-by: Mike Christie <mchristi@xxxxxxxxxx>
> ---
>  drivers/target/target_core_user.c | 42 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> index fd9fcea..7d1da43 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -145,6 +145,7 @@ struct tcmu_dev {
>  
>  	struct timer_list timeout;
>  	unsigned int cmd_time_out;
> +	unsigned int qfull_time_out;
>  
>  	spinlock_t nl_cmd_lock;
>  	struct tcmu_nl_cmd curr_nl_cmd;
> @@ -864,12 +865,15 @@ static inline size_t tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd,
>  		if (udev->cmd_time_out)
>  			ret = schedule_timeout(
>  					msecs_to_jiffies(udev->cmd_time_out));
> +		else if (udev->qfull_time_out)
> +			ret = schedule_timeout(
> +					msecs_to_jiffies(udev->qfull_time_out));
>  		else
>  			ret = schedule_timeout(msecs_to_jiffies(TCMU_TIME_OUT));
>  		finish_wait(&udev->wait_cmdr, &__wait);
>  		if (!ret) {
> -			pr_warn("tcmu: command timed out\n");
> -			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
> +			pr_warn("tcmu: command timed out waiting for ring space.\n");
> +			return TCM_OUT_OF_RESOURCES;
>  		}
>  

I am self nacking this for now. I will not have internet access for a
couple weeks, so just drop this patch.

The problem with with the ring space wait code still is that for fabric
modules like iscsi where we end up calling this code from the fabric's
module's receiving context, like the iscsi rx thread, it ends up
blocking the entire target. For iscsi, we cannot respond to nops, and
the initiator will drop the connection.

I think the options are:

1. Just drop all the wait code and always fail right away. The problem
would be that because the ring pool pages are shared across all devices
and there are multiple initiators, we might end up with some slow
initiator's getting starved.

2. Keep the timer, add the configfs qfull timeout setting, but if space
is not available then put the cmd on a list and let the unmap thread do
the timer/wait. When space becomes available it would do
tcmu_queue_cmd_ring (we could break that function up so it could be
called by the initial submission path and the unmap thread).

3. Keep the timer, add the configfs qfull timeout setting, but just
always kick off to another thread.

4. Have the initiators always turn off nops :)

5. ?
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux