Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 20, 2021 at 07:45:48PM +0100, mwilck@xxxxxxxx wrote:
> From: Martin Wilck <mwilck@xxxxxxxx>
> 
> Donald: please give this patch a try.
> 
> Commit 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> contained this hunk:
> 
> -       busy = atomic_inc_return(&shost->host_busy) - 1;
>         if (atomic_read(&shost->host_blocked) > 0) {
> -               if (busy)
> +               if (scsi_host_busy(shost) > 0)
>                         goto starved;
> 
> The previous code would increase the busy count before checking host_blocked.
> With 6eb045e092ef, the busy count would be increased (by setting the
> SCMD_STATE_INFLIGHT bit) after the if clause for host_blocked above.
> 
> Users have reported a regression with the smartpqi driver [1] which has been
> shown to be caused by this commit [2].
> 
> It seems that by moving the increase of the busy counter further down, it could
> happen that the can_queue limit of the controller could be exceeded if several
> CPUs were executing this code in parallel on different queues.

can_queue limit should never be exceeded because it is respected by
blk-mq since each hw queue's queue depth is .can_queue.

smartpqi's issue is that its .can_queue does not represent each hw
queue's depth, instead the .can_queue represents queue depth of the
whole HBA.

As John mentioned, smartpqi should have switched to hosttags.

BTW, looks the following code has soft lockup risk:

pqi_alloc_io_request():
        while (1) {
                io_request = &ctrl_info->io_request_pool[i];
                if (atomic_inc_return(&io_request->refcount) == 1)
                        break;
                atomic_dec(&io_request->refcount);
                i = (i + 1) % ctrl_info->max_io_slots;
        }

> 
> This patch attempts to fix it by moving setting the SCMD_STATE_INFLIGHT before
> the host_blocked test again. It also inserts barriers to make sure
> scsi_host_busy() on once CPU will notice the increase of the count from another.
> 
> [1]: https://marc.info/?l=linux-scsi&m=160271263114829&w=2
> [2]: https://marc.info/?l=linux-scsi&m=161116163722099&w=2

If the above is true wrt. smartpqi's can_queue usage, your patch may not fix the
issue completely in which you think '.can_queue is exceeded'.

> 
> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
> 
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Cc: Don Brace <Don.Brace@xxxxxxxxxxxxx>
> Cc: Kevin Barnett <Kevin.Barnett@xxxxxxxxxxxxx>
> Cc: Donald Buczek <buczek@xxxxxxxxxxxxx>
> Cc: John Garry <john.garry@xxxxxxxxxx>
> Cc: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> Signed-off-by: Martin Wilck <mwilck@xxxxxxxx>
> ---
>  drivers/scsi/hosts.c    | 2 ++
>  drivers/scsi/scsi_lib.c | 8 +++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 2f162603876f..1c452a1c18fd 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -564,6 +564,8 @@ static bool scsi_host_check_in_flight(struct request *rq, void *data,
>  	int *count = data;
>  	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>  
> +	/* This pairs with set_bit() in scsi_host_queue_ready() */
> +	smp_mb__before_atomic();

So the above barrier orders atomic_read(&shost->host_blocked) and
test_bit()?

>  	if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
>  		(*count)++;
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b3f14f05340a..0a9a36c349ee 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1353,8 +1353,12 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  	if (scsi_host_in_recovery(shost))
>  		return 0;
>  
> +	set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> +	/* This pairs with test_bit() in scsi_host_check_in_flight() */
> +	smp_mb__after_atomic();
> +
>  	if (atomic_read(&shost->host_blocked) > 0) {
> -		if (scsi_host_busy(shost) > 0)
> +		if (scsi_host_busy(shost) > 1)
>  			goto starved;
>  
>  		/*
> @@ -1379,8 +1383,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  		spin_unlock_irq(shost->host_lock);
>  	}
>  
> -	__set_bit(SCMD_STATE_INFLIGHT, &cmd->state);
> -

Looks this patch fine.

However, I'd suggest to confirm smartpqi's .can_queue usage first, which
looks one big issue.

-- 
Ming




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux