Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Wed, 14 Feb 2024 11:37:05 -0800




Hi Alex,

Apologies for the delay. This was due to two parts:
* I studied these code paths more and I think there may be one more
  flow that can be made more robust (more later),
* I spent a lot of time trying to trigger all the affected code paths but
  here I did not have much luck. I would prefer to run tests that trigger
  these flows so that I can get some help from the kernel lock debugging.
  From what I can tell the irqfd flows can be triggered with the help of the
  kernel-irqchip option to Qemu but on the hardware I have access to I
  could only try kernel-irqchip=auto and that did not trigger the flows
  (the irqfd is set up but the eventfd is never signaled).
  I am not familiar with this area, do you perhaps have guidance on how
  I can test these flows or do you perhaps have access to needed environments
  to test this?

On 2/8/2024 1:08 PM, Alex Williamson wrote:
> On Wed, 7 Feb 2024 15:30:15 -0800
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:

>> I studied the code more and have one more observation related to this portion
>> of the flow:
>> From what I can tell this change makes the INTx code more robust. If I
>> understand current implementation correctly it seems possible to enable
>> INTx but not have interrupt allocated. In this case the interrupt context
>> (ctx) will exist but ctx->trigger will be NULL. Current
>> vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
>> ctx is valid. It looks like it may call eventfd_signal(NULL) where
>> pointer is dereferenced.
>>
>> If this is correct then I think a separate fix that can easily be
>> backported may be needed. Something like:
> 
> Good find.  I think it's a bit more complicated though.  There are
> several paths to vfio_send_intx_eventfd:
> 
>  - vfio_intx_handler
> 
> 	This can only be called between request_irq() and free_irq()
> 	where trigger is always valid.  igate is not held.
> 
>  - vfio_pci_intx_unmask
> 
> 	Callers hold igate, additional test of ctx->trigger makes this
> 	safe.

Two callers of vfio_pci_intx_unmask() do not seem to hold igate:
vfio_basic_config_write() and vfio_pci_core_runtime_resume().

Considering this I wonder if we could add something like below to the
solution you propose. On a high level the outside callers (VFIO PCI core)
will keep using vfio_pci_intx_unmask() that will now take igate while
the interrupt management code gets a new internal __vfio_pci_intx_unmask()
that should be called with igate held. This results in:

@@ -215,12 +223,20 @@ static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
 	return ret;
 }
 
-void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+static void __vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
 {
+	lockdep_assert_held(&vdev->igate);
 	if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
 		vfio_send_intx_eventfd(vdev, NULL);
 }
 
+void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+{
+	mutex_lock(&vdev->igate);
+	__vfio_pci_intx_unmask(vdev);
+	mutex_unlock(&vdev->igate);
+}
+
 static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 {
 	struct vfio_pci_core_device *vdev = dev_id;
@@ -581,11 +597,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 		return -EINVAL;
 
 	if (flags & VFIO_IRQ_SET_DATA_NONE) {
-		vfio_pci_intx_unmask(vdev);
+		__vfio_pci_intx_unmask(vdev);
 	} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
 		uint8_t unmask = *(uint8_t *)data;
 		if (unmask)
-			vfio_pci_intx_unmask(vdev);
+			__vfio_pci_intx_unmask(vdev);
 	} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
 		struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
 		int32_t fd = *(int32_t *)data;

> 
>  - vfio_pci_set_intx_trigger
> 
> 	Same as above.
> 
>  - Through unmask eventfd (virqfd)
> 
> 	Here be dragons.
> 
> In the virqfd case, a write to the eventfd calls virqfd_wakeup() where
> we'll call the handler, vfio_pci_intx_unmask_handler(), and based on
> the result schedule the thread, vfio_send_intx_eventfd().  Both of
> these look suspicious.  They're not called under igate, so testing
> ctx->trigger doesn't resolve the race.
> 
> I think an option is to wrap the virqfd entry points in igate where we
> can then do something similar to your suggestion.  I don't think we
> want to WARN_ON(!ctx->trigger) because that's then a user reachable
> condition.  Instead we can just quietly follow the same exit paths.
> 
> I think that means we end up with something like this:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 237beac83809..ace7e1dbc607 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -92,12 +92,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
>  		struct vfio_pci_irq_ctx *ctx;
>  
>  		ctx = vfio_irq_ctx_get(vdev, 0);
> -		if (WARN_ON_ONCE(!ctx))
> +		if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
>  			return;
>  		eventfd_signal(ctx->trigger);
>  	}
>  }
>  
> +static void vfio_send_intx_eventfd_virqfd(void *opaque, void *unused)
> +{
> +	struct vfio_pci_core_device *vdev = opaque;
> +
> +	mutex_lock(&vdev->igate);
> +	vfio_send_intx_eventfd(opaque, unused);
> +	mutex_unlock(&vdev->igate);
> +}
> +
>  /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
>  bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
>  {
> @@ -170,7 +179,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
>  	}
>  
>  	ctx = vfio_irq_ctx_get(vdev, 0);
> -	if (WARN_ON_ONCE(!ctx))
> +	if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
>  		goto out_unlock;
>  
>  	if (ctx->masked && !vdev->virq_disabled) {
> @@ -194,6 +203,18 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
>  	return ret;
>  }
>  
> +static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
> +{
> +	struct vfio_pci_core_device *vdev = opaque;
> +	int ret;
> +
> +	mutex_lock(&vdev->igate);
> +	ret = vfio_pci_intx_unmask_handler(opaque, unused);
> +	mutex_unlock(&vdev->igate);
> +
> +	return ret;
> +}
> +
>  void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
>  {
>  	if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
> @@ -572,10 +593,10 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
>  		if (WARN_ON_ONCE(!ctx))
>  			return -EINVAL;
>  		if (fd >= 0)
> -			return vfio_virqfd_enable((void *) vdev,
> -						  vfio_pci_intx_unmask_handler,
> -						  vfio_send_intx_eventfd, NULL,
> -						  &ctx->unmask, fd);
> +			return vfio_virqfd_enable((void *)vdev,
> +					vfio_pci_intx_unmask_handler_virqfd,
> +					vfio_send_intx_eventfd_virqfd, NULL,
> +					&ctx->unmask, fd);
>  
>  		vfio_virqfd_disable(&ctx->unmask);
>  	}
> 
> 
> WDYT?

This looks good to me. Thank you very much for taking the time to
write it.

Reinette