On 22/11/17 16:14, Alex Williamson wrote: > On Wed, 22 Nov 2017 15:44:55 +1100 > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote: >>> On Wed, 22 Nov 2017 15:09:32 +1100 >>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: >>> >>>> By default VFIO disables mapping of MSIX BAR to the userspace as >>>> the userspace may program it in a way allowing spurious interrupts; >>>> instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. >>>> >>>> This works fine as long as the system page size equals to the MSIX >>>> alignment requirement which is 4KB. However with a bigger page size >>>> the existing code prohibits mapping non-MSIX parts of a page with MSIX >>>> structures so these parts have to be emulated via slow reads/writes on >>>> a VFIO device fd. If these emulated bits are accessed often, this has >>>> serious impact on performance. >>>> >>>> This adds an ioctl to the vfio-pci device which hides the sparse >>>> capability and allows the userspace to map a BAR with MSIX structures. >>> >>> So the user is in control of telling the kernel whether they're allowed >>> to mmap the msi-x vector table. That makes absolutely no sense. If >>> you're trying to figure out how userspace knows whether to implicitly >>> avoid mmap'ing the msix region, I think there are far better ways in >>> the existing region info ioctl. We could use a flag, or maybe the >>> existence of a capability chain pointer, or a new capability. But >>> absolutely not this. The kernel needs to decide whether it's going to >>> let the user do this, not the user. Thanks, >> >> No, it doesn't. This is actually the approach we discussed in Prague. >> >> Remember that intercepting access to the MSI-X table is not a host >> safety / security issue. It's just that without that we won't wire up >> the device's MSI-X vectors properly so they won't work. >> >> Basically the decision here is between >> >> A) Allow MSI-X configuration via standard PCI mechanisms, at the >> cost of making access slow for any registers sharing a page with >> the MSI-X table. >> >> or >> >> B) Make access to BAR registers sharing a page with the MSI-X table >> fast, at the cost of requiring some alternative mechanism to >> configure MSI-X vectors. >> >> And that is a tradeoff that it is reasonable for userspace to make. >> >> In the case of KVM guests, the decision depends entirely on the >> *guest* platform. Usually we need (A) because the guest expects to be >> able to poke the MSI-X table in the usual way. However for PAPR >> guests, there's an alternative mechanism via an RTAS call, which means >> we can use (B). >> >> The host kernel can't make this decision, because it doesn't know the >> guest platform (well, KVM might, but VFIO doesn't). >> >> A userspace VFIO program could also elect for (B) if it does care >> about performance of access to registers in the same BAR as the MSI-X >> table, but doesn't need MSI-X for example. > > You're asking for an ioctl to allow the kernel to allow the user to > mmap the page, when instead we could just allow the user to mmap the > page and How do we decide if we allow the user to mmap that? I tried several approaches, with no clear reaction from the community... -- Alexey