Hi Douglas, Can you check if this patch is already part of driver, If not please try with below patch. This patch is to fix the completion of abort before the IO completion. With this, driver will process IO's reply first followed by TM. authorSuganath prabu Subramani <suganath-prabu.subramani@xxxxxxxxxxxxx>2016-01-28 12:07:06 +0530 committerMartin K. Petersen <martin.petersen@xxxxxxxxxx>2016-02-23 21:27:02 -0500 commit03d1fb3a65783979f23bd58b5a0387e6992d9e26 (patch) tree6aca275e2ebe7fbcd5fac1654cedd8f56d0947d0 /drivers/scsi/mpt3sas parent5c739b6157bd090942e5847ddd12bfb99cd4240d (diff) downloadlinux-03d1fb3a65783979f23bd58b5a0387e6992d9e26.tar.gz mpt3sas: Fix for Asynchronous completion of timedout IO and task abort of timedout IO. Track msix of each IO and use the same msix for issuing abort to timed out IO. With this driver will process IO's reply first followed by TM. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@xxxxxxxxxxxxx> Signed-off-by: Chaitra P B <chaitra.basappa@xxxxxxxxxxxxx> Reviewed-by: Tomas Henzl <thenzl@xxxxxxxxxx> Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx> Thanks, Suganath Prabu S On Wed, Jun 6, 2018 at 7:50 PM, Douglas Miller <dougmill@xxxxxxxxxxxxxxxxxx> wrote: > Running a heavy I/O load on multipath/dual-ported SSD disks attached to a > SAS3008 adapter (mpt3sas driver), we are seeing I/Os get aborted and tasks > stuck in blk_complete_request() and this sometimes results in hitting a > BUG_ON in blk_start_request(). It would appear that we are seeing two > completions performed on an I/O, and the second completion is racing with > re-use of the request for a new I/O. > > I saw this upstream commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=9961c9bbf2b43acaaf030a0fbabc9954d937ad8c > > which addresses the case where the normal completion occurs before the abort > completion. But the situation I am seeing appears to be that the abort > completion occurs before the normal completion (due to tasks getting delayed > in blk_complete_request()). I don't find any commit to fix this second case. > > Of course, tasks being delayed like this is a concern, and is being worked > separately. But it seems that the alternate double-completion case is being > ignored here. > > Does everyone concur that this second case needs to be addressed? Is there a > proposed fix? > > Thanks, > > Doug > > FYI, system is a Power9 running RHEL-ALT 7.5, two SAS3008 adapters connected > to an IBM EXP24SX SAS Storage Enclosure with 24 HUSMM8040ASS201 drives. FIO > was being used to drive the I/O load. > >