Re: kernel BUG at drivers/scsi/scsi_error.c:197! - git 4.17.0-x64-08428-g7d3bf613e99a

"hch@xxxxxx" <hch@xxxxxx> · Wed, 13 Jun 2018 16:04:11 +0200

> I suspect this is due to we could expire a same request twice or even more.
> For scsi mid-layer, it return BLK_EH_DONE from .timeout, in fact, the request is not
> completed there, but just queue a delayed abort_work (HZ/100). If the blk_mq_timeout_work
> runs again before the abort_work, the request will be timed out again, because there is not
> any mark on it to identify this request has been timed out.
> 
> Would please try the patch attached on to see whether this issue could be fixed ?
> (this patch only works for scsi device currently)

The patch isn't really going to work without a caller of your new
__blk_mq_complete_request helper, is it?

Either way the concept of doing error handling without quiescing the
queue just looks bogus to me and will end up with some sort of race
here or there.