On 8/13/21 3:09 PM, Alex Williamson wrote: > On Fri, 13 Aug 2021 11:34:51 -0700 > Russ Weight <russell.h.weight@xxxxxxxxx> wrote: > >> Bug Description: >> >> A bug in the vfio_pci driver was reported in junction with work on FPGA > This looks like the documented behavior of an IRQ index reporting the > VFIO_IRQ_INFO_NORESIZE flag. We can certainly work towards trying to > remove the flag from this index, but it seems the userspace driver is > currently ignoring the flag and expecting exactly the behavior the flag > indicates is not available. Thanks, Thanks for the quick reply, Alex. Yes, we misunderstood the expected behavior. We have adapted our library code and everything is working now. - Russ > > Alex > >> cards. We were able to reproduce and root-cause the bug using system-tap. >> The original bug description is below. An understanding of the referenced >> dfl and opae tools is not required - it is the sequence of IOCTL calls and >> IRQ vectors that matters: >> >>> I’m trying to get an example AFU working that uses 2 IRQs, active at the same >>> time. I’m hitting what looks to be a dfl_pci driver bug. >>> >>> The code tries to allocate two IRQ vectors: 0 and 1. I see opaevfio.c doing the >>> right thing, picking the MSIX index. Allocating either IRQ 0 or IRQ 1 works fine >>> and I confirm that the VFIO_DEVICE_SET_IRQS looks reasonable, choosing MSIX and >>> either start of 0 or 1 and count 1. >>> >>> Note that opaevfio.c always passes count 1, so it will make separate calls for >>> each IRQ vector. >>> >>> When I try to allocate both, I see the following: >>> >>> * If the VFIO_DEVICE_SET_IRQS ioctl is called first with start 0 and then >>> start 1 (always count 1), the start 1 (second) ioctl trap returns EINVAL. >>> * If I set up the vectors in decreasing order, so start 1 followed by start 0, >>> the program works! >>> * I ruled out OPAE SDK user space problems by setting up my program to >>> allocate in increasing order, which would normally fail. I changed only the >>> ioctl call in user space opaevfio.c, inverting bit 0 of start so that the >>> driver is called in decreasing index order. Of course this binds the wrong >>> vectors to the fds, but I don’t care about that for now. This works! From >>> this, I conclude that it can’t be a user space problem since the difference >>> between working and failing is solely the order in which IRQ vectors are >>> bound in ioctl calls. >> The EINVAL is coming from vfio_msi_set_block() here: >> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L373 >> >> vfio_msi_set_block() is being called from vfio_pci_set_msi_trigger() here on >> the second IRQ request: >> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L530 >> >> We believe the bug is in vfio_pci_set_msi_trigger(), in the 2nd parameter to the call >> to vfio_msi_enable() here: >> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L533 >> >> In both the passing and failing cases, the first IRQ request results in a call >> to vfio_msi_enable() at line 533 and the second IRQ request results in the >> call to vfio_msi_set_block() at line 530. It is during the first IRQ request >> that vfio_msi_enable() sets vdev->num_ctx based on the 2nd parameter (nvec). >> vdev->num_ctx is part of the conditional that results in the EINVAL for the >> failing case. >> >> In the passing case, vdev->num_ctx is 2. In the failing case, it is 1. >> >> I am attaching two text files containing trace information from systemtap: one for >> the failing case and one for the passing case. They contain a lot more information >> than is needed, but if you search for vfio_pci_set_msi_trigger and vfio_msi_set_block, >> you will see values for some of the call parameters. >> >> - Russ >>