Thanks, Suganath,
That commit was introduced with version 12.100.00.00 and the distro
version we're running is 15.100.00.01 (RHEL-ALT 7.5) and appears to
include this fix - although the code is not identical, probably due to
the effects of backporting patches. This driver also does not include
commit 9961c9bbf2b43acaaf030a0fbabc9954d937ad8c, which was added much
later (added on top of driver 17.100.00.00). So, I guess I am still
looking for a companion (opposite scenario) patch to
9961c9bbf2b43acaaf030a0fbabc9954d937ad8c.
Do you have any reason to believe that both situations (normal
completion before abort, and abort before normal completion) do not need
to be handled?
Thanks,
Doug
On 06/07/2018 01:24 AM, Suganath Prabu Subramani wrote:
Hi Douglas,
Can you check if this patch is already part of driver, If not please
try with below patch.
This patch is to fix the completion of abort before the IO completion.
With this, driver will process IO's reply first followed by TM.
authorSuganath prabu Subramani
<suganath-prabu.subramani@xxxxxxxxxxxxx>2016-01-28 12:07:06 +0530
committerMartin K. Petersen <martin.petersen@xxxxxxxxxx>2016-02-23
21:27:02 -0500
commit03d1fb3a65783979f23bd58b5a0387e6992d9e26 (patch)
tree6aca275e2ebe7fbcd5fac1654cedd8f56d0947d0 /drivers/scsi/mpt3sas
parent5c739b6157bd090942e5847ddd12bfb99cd4240d (diff)
downloadlinux-03d1fb3a65783979f23bd58b5a0387e6992d9e26.tar.gz
mpt3sas: Fix for Asynchronous completion of timedout IO and task abort
of timedout IO.
Track msix of each IO and use the same msix for issuing abort to timed
out IO. With this driver will process IO's reply first followed by TM.
Signed-off-by: Suganath prabu Subramani
<suganath-prabu.subramani@xxxxxxxxxxxxx> Signed-off-by: Chaitra P B
<chaitra.basappa@xxxxxxxxxxxxx> Reviewed-by: Tomas Henzl
<thenzl@xxxxxxxxxx> Signed-off-by: Martin K. Petersen
<martin.petersen@xxxxxxxxxx>
Thanks,
Suganath Prabu S
On Wed, Jun 6, 2018 at 7:50 PM, Douglas Miller
<dougmill@xxxxxxxxxxxxxxxxxx> wrote:
Running a heavy I/O load on multipath/dual-ported SSD disks attached to a
SAS3008 adapter (mpt3sas driver), we are seeing I/Os get aborted and tasks
stuck in blk_complete_request() and this sometimes results in hitting a
BUG_ON in blk_start_request(). It would appear that we are seeing two
completions performed on an I/O, and the second completion is racing with
re-use of the request for a new I/O.
I saw this upstream commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=9961c9bbf2b43acaaf030a0fbabc9954d937ad8c
which addresses the case where the normal completion occurs before the abort
completion. But the situation I am seeing appears to be that the abort
completion occurs before the normal completion (due to tasks getting delayed
in blk_complete_request()). I don't find any commit to fix this second case.
Of course, tasks being delayed like this is a concern, and is being worked
separately. But it seems that the alternate double-completion case is being
ignored here.
Does everyone concur that this second case needs to be addressed? Is there a
proposed fix?
Thanks,
Doug
FYI, system is a Power9 running RHEL-ALT 7.5, two SAS3008 adapters connected
to an IBM EXP24SX SAS Storage Enclosure with 24 HUSMM8040ASS201 drives. FIO
was being used to drive the I/O load.