Re: [RFC PATCH vhost] vhost-vdpa: Fix invalid irq bypass unregister

Dragos Tatulea <dtatulea@xxxxxxxxxx> · Tue, 6 Aug 2024 16:45:40 +0200



On 06.08.24 10:18, Dragos Tatulea wrote:
> (Re-sending. I messed up the previous message, sorry about that.)
> 
> On 06.08.24 04:57, Jason Wang wrote:
>> On Mon, Aug 5, 2024 at 11:59 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>>
>>> On 05.08.24 05:17, Jason Wang wrote:
>>>> On Fri, Aug 2, 2024 at 2:51 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, 2024-08-02 at 11:29 +0800, Jason Wang wrote:
>>>>>> On Thu, Aug 1, 2024 at 11:38 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> The following workflow triggers the crash referenced below:
>>>>>>>
>>>>>>> 1) vhost_vdpa_unsetup_vq_irq() unregisters the irq bypass producer
>>>>>>>    but the producer->token is still valid.
>>>>>>> 2) vq context gets released and reassigned to another vq.
>>>>>>
>>>>>> Just to make sure I understand here, which structure is referred to as
>>>>>> "vq context" here? I guess it's not call_ctx as it is a part of the vq
>>>>>> itself.
>>>>>>
>>>>>>> 3) That other vq registers it's producer with the same vq context
>>>>>>>    pointer as token in vhost_vdpa_setup_vq_irq().
>>>>>>
>>>>>> Or did you mean when a single eventfd is shared among different vqs?
>>>>>>
>>>>> Yes, that's what I mean: vq->call_ctx.ctx which is a eventfd_ctx.
>>>>>
>>>>> But I don't think it's shared in this case, only that the old eventfd_ctx value
>>>>> is lingering in producer->token. And this old eventfd_ctx is assigned now to
>>>>> another vq.
>>>>
>>>> Just to make sure I understand the issue. The eventfd_ctx should be
>>>> still valid until a new VHOST_SET_VRING_CALL().
>>>>
>>> I think it's not about the validity of the eventfd_ctx. More about
>>> the lingering ctx value of the producer after vhost_vdpa_unsetup_vq_irq().
>>
>> Probably, but
>>
>>> That value is the eventfd ctx, but it could be anything else really...
>>
>> I mean we hold a refcnt of the eventfd so it should be valid until the
>> next set_vring_call() or vhost_dev_cleanup().
>>
>> But I do spot some possible issue:
>>
>> 1) We swap and assign new ctx in vhost_vring_ioctl():
>>
>>                 swap(ctx, vq->call_ctx.ctx);
>>
>> 2) and old ctx will be put there as well:
>>
>>                 if (!IS_ERR_OR_NULL(ctx))
>>                         eventfd_ctx_put(ctx);
>>
>> 3) but in vdpa, we try to unregister the producer with the new token:
>>
>> static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>>                            void __user *argp)
>> {
>> ...
>>         r = vhost_vring_ioctl(&v->vdev, cmd, argp);
>> ...
>>         switch (cmd) {
>> ...
>>         case VHOST_SET_VRING_CALL:
>>                 if (vq->call_ctx.ctx) {
>>                         cb.callback = vhost_vdpa_virtqueue_cb;
>>                         cb.private = vq;
>>                         cb.trigger = vq->call_ctx.ctx;
>>                 } else {
>>                         cb.callback = NULL;
>>                         cb.private = NULL;
>>                         cb.trigger = NULL;
>>                 }
>>                 ops->set_vq_cb(vdpa, idx, &cb);
>>                 vhost_vdpa_setup_vq_irq(v, idx);
>>
>> in vhost_vdpa_setup_vq_irq() we had:
>>
>>         irq_bypass_unregister_producer(&vq->call_ctx.producer);
>>
>> here the producer->token still points to the old one...
>>
>> Is this what you have seen?
> Yup. That is the issue. The unregister already happened at
> vhost_vdpa_unsetup_vq_irq(). So this second unregister will
> work on an already unregistered element due to the token still
> being set.
> 
>>
>>>
>>>
>>>> I may miss something but the only way to assign exactly the same
>>>> eventfd_ctx value to another vq is where the guest tries to share the
>>>> MSI-X vector among virtqueues, then qemu will use a single eventfd as
>>>> the callback for multiple virtqueues. If this is true:
>>>>
>>> I don't think this is the case. I see the issue happening when running qemu vdpa
>>> live migration tests on the same host. From a vdpa device it's basically a device
>>> starting on a VM over and over.
>>>
>>>> For bypass registering, only the first registering can succeed as the
>>>> following registering will fail because the irq bypass manager already
>>>> had exactly the same producer token.
>>>> For registering, all unregistering can succeed:
>>>>
>>>> 1) the first unregistering will do the real job that unregister the token
>>>> 2) the following unregistering will do nothing by iterating the
>>>> producer token list without finding a match one
>>>>
>>>> Maybe you can show me the userspace behaviour (ioctls) when you see this?
>>>>
>>> Sure, what would you need? qemu traces?
>>
>> Yes, that would be helpful.
>>
> Will try to get them.
As the traces are quite large (~5MB), I uploaded them in this location [0].
I used the following qemu traces:
--trace vhost_vdpa* --trace virtio_net_handle*

[0] https://drive.google.com/file/d/1XyXYyockJ_O7zMgI7vot6AxYjze9Ljju/view?usp=sharing

Thanks,
Dragos