Re: virtio-scsi issues duplicate tags when async_abort is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 13, 2014 at 11:37 AM, Hannes Reinecke <hare@xxxxxxx> wrote:
> On 06/13/2014 07:58 PM, Venkatesh Srinivas wrote:
>>
>> Hi,
>>
>> In Linux 3.14+, SCSI timeouts are handled first without invoking EH;
>> this behavior is on by default but can be disabled with the
>> per-shost-template no_async_abort flag.
>>
>> When a SCSI target is attached to a virtio-scsi HBA and is under I/O
>> stress (lots of concurrent I/O + some I/O running slowly), we see
>> Linux issue commands with duplicate tags, sometimes with tags matching
>> commands which are in the process of being aborted; we see this
>> readily in the Google Compute Engine hypervisor.
>>
>> This behaviour is not seen on Linux <= 3.13 and is not seen if 3.14's
>> virtio_scsi driver has no_async_abort set to 1.
>>
>> An ordering we have seen, from the device perspective:
>> t0: I/O with tag 18446612135224154432 issued
>> t1: TMF Abort for tag 18446612135224154432
>> t2: Another I/O with the same tag, 18446612135224154432, issued; same
>> offset/size as at t0
>> [neither the t0 I/O nor the TMF ABORT have yet returned!]
>>
>> Another ordering we have seen, from the device perspective:
>> t0: I/O with tag 18446612135454768576 issued
>> t1: TMF ABORT for tag 18446612135454768576
>> t2: I/O 18446612135454768576 completes with appropriate cancelled status
>> t3: TMF ABORT completes with OK status
>> t4: New I/O with tag 18446612135454768576, matching size/offset as t0
>> t5...: [Some other I/Os issued to the same SCSI target]
>> t6...: [TMF ABORT for one of the new I/Os; proper return sequence]
>> t7...: New I/O with tag 18446612135454768576.
>>    [Tag 18446612135454768576 has neither completed nor has it been
>> aborted by Linux.]
>>
>> CC-ing stable as 3.14 and 3.15 are affected; a conservative fix is to
>> enable no_async_abort until the problem is better-understood.
>>
> Paolo, you had some fixes for virtio_scsi which should solve this, right?

The outstanding patches for virtio_scsi would not explain this bug,
its dependence on Linux 3.14+, or that it does not repro with
no_async_abort=1.

Thanks,
-- vs;
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]