Re: blk_requeue_request BUG_ON

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Wed, May 13, 2015 at 3:54 PM, Brian King <brking@xxxxxxxxxxxxxxxxxx> wrote:
> I've been chasing a BUG_ON in blk_requeue_request which seems to occur
> in scenarios where we are seeing lots of SCSI aborts. As I've been digging
> through the completion paths and abort paths, I've noticed if the following
> sequence occurs, we are likely to hit this issue:
>
> 1. scsi_cmd times out, async abort issued
> 2. LLDD aborts command, LLDD calls scsi_done for the aborted command from interrupt handler
>    when aborted command comes back
> 3. If result of the aborted command is something like DID_ERROR and we allow retries,
>    then in scsi_done processing, we'll call scsi_queue_insert which then calls blk_requeue_request
> 4. Returning from the LLDD's eh_abort handler, scsi_error sees the abort was successful,
>    and then calls scsi_queue_insert for the aborted command, which also calls blk_requeue_request
>    where we hit the BUG_ON because the command has been queued again.
>
> This is occurring for the non blk_rq_tagged case, for reference, so blk_requeue_request
> doesn't call blk_queue_end_tag which might cause this to not be hit...
>
> Should a LLDD NOT call scsi_done for commands it aborts?

I think the right statement would be that the LLDD should not call
scsi_done() AFTER the eh_abort_handler() has returned SUCCESS, because
once SUCCESS is returned from the eh_abort_handler, the expectation is
that the LLDD and the HW has completely forgotten about that command
(and hence would not call scsi_done()). Also, from the time the
eh_abort_handler returns, the mid level is free to do what it pleases
with that scsi_cmnd (requeue it or free it or finish it) - thus LLDD
cannot be holding a pointer to scsi_cmnd anymore as it could be
pointing elsewhere than what LLD thinks.

Thanks,

Rajat

> We've seen the issue above on
> both ibmvfc and mpt2sas, but I know there are other LLDDs that call scsi_done in this case,
> but just do it before eh_abort returns. Or is it expected the LLDD will only ever return DID_ABORT
> on the aborted command, which looks like it might prevent this issue as well, however, that seems
> racy and then we'd also need to add some memory barriers around the checking / setting of scmd->eh_eflags
> I would think.
>
> Or am I missing something and headed down the wrong path?
>
> Thanks,
>
> Brian
>
> --
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux