Re: [RFC] virtio_scsi: to poll and kick the virtqueue in timeout handler

Dongli Zhang <dongli.zhang@xxxxxxxxxx> · Tue, 25 May 2021 10:34:24 -0700

On 5/25/21 10:24 AM, Hannes Reinecke wrote:
> On 5/25/21 6:47 PM, Stefan Hajnoczi wrote:
>> On Mon, May 24, 2021 at 11:33:33PM -0700, Dongli Zhang wrote:
>>> On 5/24/21 6:24 AM, Stefan Hajnoczi wrote:
>>>> On Sun, May 23, 2021 at 09:39:51AM +0200, Hannes Reinecke wrote:
>>>>> On 5/23/21 8:38 AM, Dongli Zhang wrote:
>>>>>> This RFC is to trigger the discussion about to poll and kick the
>>>>>> virtqueue on purpose in virtio-scsi timeout handler.
>>>>>>
>>>>>> The virtio-scsi relies on the virtio vring shared between VM and host.
>>>>>> The VM side produces requests to vring and kicks the virtqueue, while the
>>>>>> host side produces responses to vring and interrupts the VM side.
>>>>>>
>>>>>> By default the virtio-scsi handler depends on the host timeout handler
>>>>>> by BLK_EH_RESET_TIMER to give host a chance to perform EH.
>>>>>>
>>>>>> However, this is not helpful for the case that the responses are available
>>>>>> on vring but the notification from host to VM is lost.
>>>>>>
>>>>> How can this happen?
>>>>> If responses are lost the communication between VM and host is broken, and
>>>>> we should rather reset the virtio rings themselves.
>>>>
>>>> I agree. In principle it's fine to poll the virtqueue at any time, but I
>>>> don't understand the failure scenario here. It's not clear to me why the
>>>> device-to-driver vq notification could be lost.
>>>>
>>>
>>> One example is the CPU hotplug issue before the commit bf0beec0607d ("blk-mq:
>>> drain I/O when all CPUs in a hctx are offline") was available. The issue is
>>> equivalent to loss of interrupt. Without the CPU hotplug fix, while NVMe driver
>>> relies on the timeout handler to complete inflight IO requests, the PV
>>> virtio-scsi may hang permanently.
>>>
>>> In addition, as the virtio/vhost/QEMU are complex software, we are not able to
>>> guarantee there is no further lost of interrupt/kick issue in the future. It is
>>> really painful if we encounter such issue in production environment.
>>
>> Any number of hardware or software bugs might exist that we don't know
>> about, yet we don't pre-emptively add workarounds for them because where
>> do you draw the line?
>>
>> I checked other SCSI/block drivers and found it's rare to poll in the
>> timeout function so there does not seem to be a consensus that it's
>> useful to do this.
>>
> Not only this; it's downright dangerous attempting to do that in SCSI.
> In SCSI we don't have fixed lifetime guarantees that NVMe has, so there will be
> a race condition between timeout and command completion.

Thank you very much for the explanation. Yes, we cannot do that due to the race.

Dongli Zhang

> Plus there is no interface in SCSI allowing to 'poll' for completions in a
> meaningful manner.
> 
>> That said, it's technically fine to do it, the virtqueue APIs are there
>> and can be used like this. So if you and others think this is necessary,
>> then it's a pretty small change and I'm not against merging a patch like
>> this.
>>
> I would rather _not_ put more functionality into the virtio_scsi timeout
> handler; this only serves to assume that the timeout handler has some
> functionality in virtio.
> Which it patently hasn't, as the prime reason for a timeout handler is to
> _abort_ a command, which we can't on virtio.
> Well, we can on virtio, but qemu as the main user will re-route the I/O from
> virtio into doing async-I/O, and there is no way how we can abort outstanding
> asynchronous I/O.
> Or any other ioctl, for that matter.
> 
> Cheers,
> 
> Hannes