On Wed, 29 Nov 2017 17:03:31 -0500 Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: > On Wed, Nov 22, 2017 at 03:44:55PM +1100, David Gibson wrote: > > On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote: > > > On Wed, 22 Nov 2017 15:09:32 +1100 > > > Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > > > > > > > By default VFIO disables mapping of MSIX BAR to the userspace as > > > > the userspace may program it in a way allowing spurious interrupts; > > > > instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. > > > > > > > > This works fine as long as the system page size equals to the MSIX > > > > alignment requirement which is 4KB. However with a bigger page size > > > > the existing code prohibits mapping non-MSIX parts of a page with MSIX > > > > structures so these parts have to be emulated via slow reads/writes on > > > > a VFIO device fd. If these emulated bits are accessed often, this has > > > > serious impact on performance. > > > > > > > > This adds an ioctl to the vfio-pci device which hides the sparse > > > > capability and allows the userspace to map a BAR with MSIX structures. > > > > > > So the user is in control of telling the kernel whether they're allowed > > > to mmap the msi-x vector table. That makes absolutely no sense. If > > > you're trying to figure out how userspace knows whether to implicitly > > > avoid mmap'ing the msix region, I think there are far better ways in > > > the existing region info ioctl. We could use a flag, or maybe the > > > existence of a capability chain pointer, or a new capability. But > > > absolutely not this. The kernel needs to decide whether it's going to > > > let the user do this, not the user. Thanks, > > > > No, it doesn't. This is actually the approach we discussed in Prague. > > > > Remember that intercepting access to the MSI-X table is not a host > > safety / security issue. It's just that without that we won't wire up > > How is that not a security issue? Having an guest or an user-space > access to muck with the MSI-X vectors allows them to do some form > of memory writes using the IOAPIC. The MSI-X vector table only specifies the address and data used for a DMA write to trigger an interrupt. Any device capable of DMA is already able to generate those same DMA writes regardless of what's in the MSI-X vector table. Therefore preventing mmap of the vector table is at best an obfuscation, which isn't worth the potentially significant handicap it imposes for architectures with system page sizes larger than the PCI spec recommended alignments. All DMA, including DMA writes to interrupt controllers, is expected to be handled by the IOMMU or the user should have had to opt-in to insecure interrupts already. Thanks, Alex