Re: [PATCH] qla2xxx: Fix extraneous ref on sp's after adapter break

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On May 25, 2017, at 12:26 PM, Bill Kuzeja <william.kuzeja@xxxxxxxxxxx> wrote:
> 
> Hung task timeouts can result if a qlogic board breaks unexpectedly while
> running I/O. These tasks become hung because command srb reference counts
> are not going to zero, hence the affected srbs and commands do not get 
> freed. This fix accounts for this extra reference in the srbs in the case 
> of a board failure. 
> 
> Fixes: a465537ad1a4 ("qla2xxx: Disable the adapter and skip error recovery in case of register disconnect")
> Signed-off-by: Bill Kuzeja <william.kuzeja@xxxxxxxxxxx>
> ---
> 
> I encountered this hang during by adapter break testing (on Stratus
> hardware, we need to survive adapter breaks). I noticed that we wait
> indefinitely for several commands which all have outstanding
> references to them, but which have all followed this code path:
> 
> qla2x00_abort_all_cmds
>   qla2xxx_eh_abort
>       Exit early because qla2x00_isp_reg_stat is non-zero.
> 
> Note that before calling qla2xxx_eh_abort from this path, we do an extra
> sp_get(sp), which takes out another reference on the current sp. If we
> do not early exit immediately from qla2xxx_eh_abort, we'll get rid of this
> extra reference through the abort - which is the normal case.
> 
> However, if we exit immediately, this extra sp_get is never dereferenced.
> Each one of the commands flowing through this code will be stuck forever,
> resulting in hung tasks.
> 
> This early exit in qla2xxx_eh_abort was introduced by this commit:
> 
> commit a465537ad1a4 ("qla2xxx: Disable the adapter and skip error recovery in case of register disconnect.")
> 
> The solution for this is somewhat tricky because qla2xxx_eh_abort can be
> invoked from the scsi error handler as well as qla2x00_abort_all_cmds.
> I originally thought of having qla2xxx_eh_abort remove the extra reference
> before early exiting, but not knowing the context of its invocation,
> this seemed dangerous.
> 
> Alternatively we could also just remove the early exit case from
> qla2xxx_eh_abort, but the aforementioned commit was put in for a reason, so 
> that doesn't seem correct either.
> 
> The right thing to do seems to be putting the fix in qla2x00_abort_all_cmds,
> checking the conditions we where we have exited early from qla2xxx_eh_abort
> before removing the extra reference.
> 
> ---
> drivers/scsi/qla2xxx/qla_os.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 1c79579..1a93365 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1632,7 +1632,7 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
> void
> qla2x00_abort_all_cmds(scsi_qla_host_t *vha, int res)
> {
> -	int que, cnt;
> +	int que, cnt, status;
> 	unsigned long flags;
> 	srb_t *sp;
> 	struct qla_hw_data *ha = vha->hw;
> @@ -1662,8 +1662,12 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
> 					 */
> 					sp_get(sp);
> 					spin_unlock_irqrestore(&ha->hardware_lock, flags);
> -					qla2xxx_eh_abort(GET_CMD_SP(sp));
> +					status = qla2xxx_eh_abort(GET_CMD_SP(sp));
> 					spin_lock_irqsave(&ha->hardware_lock, flags);
> +					/* Get rid of extra reference if immediate exit
> +					 * from ql2xxx_eh_abort */
> +					if (status == FAILED && (qla2x00_isp_reg_stat(ha)))
> +						atomic_dec(&sp->ref_count);
> 				}
> 				req->outstanding_cmds[cnt] = NULL;
> 				sp->done(sp, res);
> -- 
> 1.8.3.1
> 

Looks Good. 

Acked-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>

Thanks,
- Himanshu





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux