On 22/11/17 17:51, David Gibson wrote: > On Tue, Nov 21, 2017 at 10:14:45PM -0700, Alex Williamson wrote: >> On Wed, 22 Nov 2017 15:44:55 +1100 >> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >>> On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote: >>>> On Wed, 22 Nov 2017 15:09:32 +1100 >>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: >>>> >>>>> By default VFIO disables mapping of MSIX BAR to the userspace as >>>>> the userspace may program it in a way allowing spurious interrupts; >>>>> instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. >>>>> >>>>> This works fine as long as the system page size equals to the MSIX >>>>> alignment requirement which is 4KB. However with a bigger page size >>>>> the existing code prohibits mapping non-MSIX parts of a page with MSIX >>>>> structures so these parts have to be emulated via slow reads/writes on >>>>> a VFIO device fd. If these emulated bits are accessed often, this has >>>>> serious impact on performance. >>>>> >>>>> This adds an ioctl to the vfio-pci device which hides the sparse >>>>> capability and allows the userspace to map a BAR with MSIX structures. >>>> >>>> So the user is in control of telling the kernel whether they're allowed >>>> to mmap the msi-x vector table. That makes absolutely no sense. If >>>> you're trying to figure out how userspace knows whether to implicitly >>>> avoid mmap'ing the msix region, I think there are far better ways in >>>> the existing region info ioctl. We could use a flag, or maybe the >>>> existence of a capability chain pointer, or a new capability. But >>>> absolutely not this. The kernel needs to decide whether it's going to >>>> let the user do this, not the user. Thanks, >>> >>> No, it doesn't. This is actually the approach we discussed in Prague. >>> >>> Remember that intercepting access to the MSI-X table is not a host >>> safety / security issue. It's just that without that we won't wire up >>> the device's MSI-X vectors properly so they won't work. >>> >>> Basically the decision here is between >>> >>> A) Allow MSI-X configuration via standard PCI mechanisms, at the >>> cost of making access slow for any registers sharing a page with >>> the MSI-X table. >>> >>> or >>> >>> B) Make access to BAR registers sharing a page with the MSI-X table >>> fast, at the cost of requiring some alternative mechanism to >>> configure MSI-X vectors. >>> >>> And that is a tradeoff that it is reasonable for userspace to make. >>> >>> In the case of KVM guests, the decision depends entirely on the >>> *guest* platform. Usually we need (A) because the guest expects to be >>> able to poke the MSI-X table in the usual way. However for PAPR >>> guests, there's an alternative mechanism via an RTAS call, which means >>> we can use (B). >>> >>> The host kernel can't make this decision, because it doesn't know the >>> guest platform (well, KVM might, but VFIO doesn't). >>> >>> A userspace VFIO program could also elect for (B) if it does care >>> about performance of access to registers in the same BAR as the MSI-X >>> table, but doesn't need MSI-X for example. >> >> You're asking for an ioctl to allow the kernel to allow the user to >> mmap the page, when instead we could just allow the user to mmap the >> page and whether the user does that and how they make use of it is up >> to them... > > Duh. Sorry. For some reason I was thinking the magic MSI-X > interception was happening in the host kernel rather than in qemu. > >> I understand that there are different virtualization techniques at play >> here, it just doesn't seem relevant. In the case of (A), the user can >> choose not to mmap the page overlapping the vector table even if the >> kernel allows it. The user can also choose to mmap that page, but not >> use the portion overlapping the vector table. QEMU already does this >> by overlaying a MemoryRegion for vector table emulation. We might even >> be able to get away with mmaping that page and emulating the vector >> table elsewhere, which seems like the only option for a 64k page ARM >> system. For (B), clearly it's just a nuisance that we can't currently >> mmap this page, but I still don't see how the user allowing the kernel >> to allow the user to mmap that page makes any sense. I can't even >> describe it without it sounding ridiculous. Thanks, > > Right. Rethinking.. it seems to me we should just completely remove > the logic from the kernel banning mmap()s overlapping the MSI-X > table. All it does is poorly attempt to stop the user shooting > themselves in the foot. > > Then we just need logic in qemu to avoid doing the overlapping memory > region nonsense on a per-machine basis So is there still any plan or we just ditch the feature? I am confused now. -- Alexey
Attachment:
signature.asc
Description: OpenPGP digital signature