Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 8 Feb 2024 14:08:36 -0700

On Wed, 7 Feb 2024 15:30:15 -0800
Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:

> Hi Alex,
> 
> On 2/6/2024 3:19 PM, Alex Williamson wrote:
> > On Tue, 6 Feb 2024 14:22:04 -0800
> > Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:  
> >> On 2/6/2024 2:03 PM, Alex Williamson wrote:  
> >>> On Tue, 6 Feb 2024 13:46:37 -0800
> >>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:  
> >>>> On 2/5/2024 2:35 PM, Alex Williamson wrote:    
> >>>>> On Thu,  1 Feb 2024 20:57:09 -0800
> >>>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:      
> >>>>
> >>>> ..
> >>>>    
> >>>>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> >>>>>>  		if (is_intx(vdev))
> >>>>>>  			return vfio_irq_set_block(vdev, start, count, fds, index);
> >>>>>>  
> >>>>>> -		ret = vfio_intx_enable(vdev);
> >>>>>> +		ret = vfio_intx_enable(vdev, start, count, index);      
> >>>>>
> >>>>> Please trace what happens when a user calls SET_IRQS to setup a trigger
> >>>>> eventfd with start = 0, count = 1, followed by any other combination of
> >>>>> start and count values once is_intx() is true.  vfio_intx_enable()
> >>>>> cannot be the only place we bounds check the user, all of the INTx
> >>>>> callbacks should be an error or nop if vector != 0.  Thanks,
> >>>>>       
> >>>>
> >>>> Thank you very much for catching this. I plan to add the vector
> >>>> check to the device_name() and request_interrupt() callbacks. I do
> >>>> not think it is necessary to add the vector check to disable() since
> >>>> it does not operate on a range and from what I can tell it depends on
> >>>> a successful enable() that already contains the vector check. Similar,
> >>>> free_interrupt() requires a successful request_interrupt() (that will
> >>>> have vector check in next version).
> >>>> send_eventfd() requires a valid interrupt context that is only
> >>>> possible if enable() or request_interrupt() succeeded.    
> >>>
> >>> Sounds reasonable.
> >>>     
> >>>> If user space creates an eventfd with start = 0 and count = 1
> >>>> and then attempts to trigger the eventfd using another combination then
> >>>> the changes in this series will result in a nop while the current
> >>>> implementation will result in -EINVAL. Is this acceptable?    
> >>>
> >>> I think by nop, you mean the ioctl returns success.  Was the call a
> >>> success?  Thanks,    
> >>
> >> Yes, I mean the ioctl returns success without taking any
> >> action (nop).
> >>
> >> It is not obvious to me how to interpret "success" because from what I
> >> understand current INTx and MSI/MSI-x are behaving differently when
> >> considering this flow. If I understand correctly, INTx will return
> >> an error if user space attempts to trigger an eventfd that has not
> >> been set up while MSI and MSI-x will return 0.
> >>
> >> I can restore existing INTx behavior by adding more logic and a return
> >> code to the send_eventfd() callback so that the different interrupt types
> >> can maintain their existing behavior.  
> > 
> > Ah yes, I see the dilemma now.  INTx always checked start/count were
> > valid but MSI/X plowed through regardless, and with this series we've
> > standardized the loop around the MSI/X flow.
> > 
> > Tricky, but probably doesn't really matter.  Unless we break someone.
> > 
> > I can ignore that INTx can be masked and signaling a masked vector
> > doesn't do anything, but signaling an unconfigured vector feels like an
> > error condition and trying to create verbiage in the uAPI header to
> > weasel out of that error and unconditionally return success makes me
> > cringe.
> > 
> > What if we did this:
> > 
> >         uint8_t *bools = data;
> > 	...
> >         for (i = start; i < start + count; i++) {
> >                 if ((flags & VFIO_IRQ_SET_DATA_NONE) ||
> >                     ((flags & VFIO_IRQ_SET_DATA_BOOL) && bools[i - start])) {
> >                         ctx = vfio_irq_ctx_get(vdev, i);
> >                         if (!ctx || !ctx->trigger)
> >                                 return -EINVAL;
> >                         intr_ops[index].send_eventfd(vdev, ctx);
> >                 }
> >         }
> >   
> 
> This looks good. Thank you very much. Will do.
> 
> I studied the code more and have one more observation related to this portion
> of the flow:
> From what I can tell this change makes the INTx code more robust. If I
> understand current implementation correctly it seems possible to enable
> INTx but not have interrupt allocated. In this case the interrupt context
> (ctx) will exist but ctx->trigger will be NULL. Current
> vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
> ctx is valid. It looks like it may call eventfd_signal(NULL) where
> pointer is dereferenced.
> 
> If this is correct then I think a separate fix that can easily be
> backported may be needed. Something like:

Good find.  I think it's a bit more complicated though.  There are
several paths to vfio_send_intx_eventfd:

 - vfio_intx_handler

	This can only be called between request_irq() and free_irq()
	where trigger is always valid.  igate is not held.

 - vfio_pci_intx_unmask

	Callers hold igate, additional test of ctx->trigger makes this
	safe.

 - vfio_pci_set_intx_trigger

	Same as above.

 - Through unmask eventfd (virqfd)

	Here be dragons.

In the virqfd case, a write to the eventfd calls virqfd_wakeup() where
we'll call the handler, vfio_pci_intx_unmask_handler(), and based on
the result schedule the thread, vfio_send_intx_eventfd().  Both of
these look suspicious.  They're not called under igate, so testing
ctx->trigger doesn't resolve the race.

I think an option is to wrap the virqfd entry points in igate where we
can then do something similar to your suggestion.  I don't think we
want to WARN_ON(!ctx->trigger) because that's then a user reachable
condition.  Instead we can just quietly follow the same exit paths.

I think that means we end up with something like this:

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 237beac83809..ace7e1dbc607 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -92,12 +92,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
 		struct vfio_pci_irq_ctx *ctx;
 
 		ctx = vfio_irq_ctx_get(vdev, 0);
-		if (WARN_ON_ONCE(!ctx))
+		if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
 			return;
 		eventfd_signal(ctx->trigger);
 	}
 }
 
+static void vfio_send_intx_eventfd_virqfd(void *opaque, void *unused)
+{
+	struct vfio_pci_core_device *vdev = opaque;
+
+	mutex_lock(&vdev->igate);
+	vfio_send_intx_eventfd(opaque, unused);
+	mutex_unlock(&vdev->igate);
+}
+
 /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
 bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 {
@@ -170,7 +179,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 	}
 
 	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (WARN_ON_ONCE(!ctx))
+	if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
 		goto out_unlock;
 
 	if (ctx->masked && !vdev->virq_disabled) {
@@ -194,6 +203,18 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 	return ret;
 }
 
+static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
+{
+	struct vfio_pci_core_device *vdev = opaque;
+	int ret;
+
+	mutex_lock(&vdev->igate);
+	ret = vfio_pci_intx_unmask_handler(opaque, unused);
+	mutex_unlock(&vdev->igate);
+
+	return ret;
+}
+
 void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
 {
 	if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
@@ -572,10 +593,10 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 		if (WARN_ON_ONCE(!ctx))
 			return -EINVAL;
 		if (fd >= 0)
-			return vfio_virqfd_enable((void *) vdev,
-						  vfio_pci_intx_unmask_handler,
-						  vfio_send_intx_eventfd, NULL,
-						  &ctx->unmask, fd);
+			return vfio_virqfd_enable((void *)vdev,
+					vfio_pci_intx_unmask_handler_virqfd,
+					vfio_send_intx_eventfd_virqfd, NULL,
+					&ctx->unmask, fd);
 
 		vfio_virqfd_disable(&ctx->unmask);
 	}


WDYT?
 
> > And we note the behavior change for MSI/X in the commit log and if
> > someone shouts that we broke them, we can make that an -errno or
> > continue based on is_intx().  Sound ok?  Thanks,  
> 
> I'll be sure to highlight the impact on MSI/MSI-x. Please do expect this
> in the final patch "vfio/pci: Remove duplicate interrupt management flow"
> though since that is where the different flows are merged.
> 
> I am not familiar with how all user space interacts with this flow and if/how
> this may break things. I did look at Qemu code and I was not able to find
> where it intentionally triggers MSI/MSI-x interrupts, I could only find it
> for INTx.

Being able to trigger the interrupt via ioctl is more of a diagnostic
feature, not typically used in production.
 
> If this does break things I would like to also consider moving the
> different behavior into the interrupt type's respective send_eventfd()
> callback instead of adding interrupt type specific code (like
> is_intx()) into the shared flow.

Sure, we can pick the best option in the slim (imo) chance the change
affects anyone.  Thanks,

Alex