Re: [RFC PATCH kernel] vfio-pci: Allow mapping MSIX BAR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 29, 2017 at 03:38:30PM +1100, Alexey Kardashevskiy wrote:
> On 29/11/17 14:57, David Gibson wrote:
> > On Wed, Nov 29, 2017 at 02:25:43PM +1100, Alexey Kardashevskiy wrote:
> >> On 22/11/17 17:51, David Gibson wrote:
> >>> On Tue, Nov 21, 2017 at 10:14:45PM -0700, Alex Williamson wrote:
> >>>> On Wed, 22 Nov 2017 15:44:55 +1100
> >>>> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>>> On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote:
> >>>>>> On Wed, 22 Nov 2017 15:09:32 +1100
> >>>>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> >>>>>>   
> >>>>>>> By default VFIO disables mapping of MSIX BAR to the userspace as
> >>>>>>> the userspace may program it in a way allowing spurious interrupts;
> >>>>>>> instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
> >>>>>>>
> >>>>>>> This works fine as long as the system page size equals to the MSIX
> >>>>>>> alignment requirement which is 4KB. However with a bigger page size
> >>>>>>> the existing code prohibits mapping non-MSIX parts of a page with MSIX
> >>>>>>> structures so these parts have to be emulated via slow reads/writes on
> >>>>>>> a VFIO device fd. If these emulated bits are accessed often, this has
> >>>>>>> serious impact on performance.
> >>>>>>>
> >>>>>>> This adds an ioctl to the vfio-pci device which hides the sparse
> >>>>>>> capability and allows the userspace to map a BAR with MSIX structures.  
> >>>>>>
> >>>>>> So the user is in control of telling the kernel whether they're allowed
> >>>>>> to mmap the msi-x vector table.  That makes absolutely no sense.  If
> >>>>>> you're trying to figure out how userspace knows whether to implicitly
> >>>>>> avoid mmap'ing the msix region, I think there are far better ways in
> >>>>>> the existing region info ioctl.  We could use a flag, or maybe the
> >>>>>> existence of a capability chain pointer, or a new capability.  But
> >>>>>> absolutely not this.  The kernel needs to decide whether it's going to
> >>>>>> let the user do this, not the user.  Thanks,  
> >>>>>
> >>>>> No, it doesn't.  This is actually the approach we discussed in Prague.
> >>>>>
> >>>>> Remember that intercepting access to the MSI-X table is not a host
> >>>>> safety / security issue.  It's just that without that we won't wire up
> >>>>> the device's MSI-X vectors properly so they won't work.
> >>>>>
> >>>>> Basically the decision here is between
> >>>>>
> >>>>>    A) Allow MSI-X configuration via standard PCI mechanisms, at the
> >>>>>       cost of making access slow for any registers sharing a page with
> >>>>>       the MSI-X table.
> >>>>>
> >>>>> or
> >>>>>
> >>>>>    B) Make access to BAR registers sharing a page with the MSI-X table
> >>>>>       fast, at the cost of requiring some alternative mechanism to
> >>>>>       configure MSI-X vectors.
> >>>>>
> >>>>> And that is a tradeoff that it is reasonable for userspace to make.
> >>>>>
> >>>>> In the case of KVM guests, the decision depends entirely on the
> >>>>> *guest* platform.  Usually we need (A) because the guest expects to be
> >>>>> able to poke the MSI-X table in the usual way.  However for PAPR
> >>>>> guests, there's an alternative mechanism via an RTAS call, which means
> >>>>> we can use (B).
> >>>>>
> >>>>> The host kernel can't make this decision, because it doesn't know the
> >>>>> guest platform (well, KVM might, but VFIO doesn't).
> >>>>>
> >>>>> A userspace VFIO program could also elect for (B) if it does care
> >>>>> about performance of access to registers in the same BAR as the MSI-X
> >>>>> table, but doesn't need MSI-X for example.
> >>>>
> >>>> You're asking for an ioctl to allow the kernel to allow the user to
> >>>> mmap the page, when instead we could just allow the user to mmap the
> >>>> page and whether the user does that and how they make use of it is up
> >>>> to them...
> >>>
> >>> Duh.  Sorry.  For some reason I was thinking the magic MSI-X
> >>> interception was happening in the host kernel rather than in qemu.
> >>>
> >>>> I understand that there are different virtualization techniques at play
> >>>> here, it just doesn't seem relevant.  In the case of (A), the user can
> >>>> choose not to mmap the page overlapping the vector table even if the
> >>>> kernel allows it.  The user can also choose to mmap that page, but not
> >>>> use the portion overlapping the vector table.  QEMU already does this
> >>>> by overlaying a MemoryRegion for vector table emulation.  We might even
> >>>> be able to get away with mmaping that page and emulating the vector
> >>>> table elsewhere, which seems like the only option for a 64k page ARM
> >>>> system.  For (B), clearly it's just a nuisance that we can't currently
> >>>> mmap this page, but I still don't see how the user allowing the kernel
> >>>> to allow the user to mmap that page makes any sense.  I can't even
> >>>> describe it without it sounding ridiculous.  Thanks,
> >>>
> >>> Right.  Rethinking..  it seems to me we should just completely remove
> >>> the logic from the kernel banning mmap()s overlapping the MSI-X
> >>> table.  All it does is poorly attempt to stop the user shooting
> >>> themselves in the foot.
> >>>
> >>> Then we just need logic in qemu to avoid doing the overlapping memory
> >>> region nonsense on a per-machine basis
> >>
> >>
> >> So is there still any plan or we just ditch the feature? I am confused now.
> > 
> > The plan is what I said above.  Remove the bogus check logic from the
> > kernel, then solve within qemu, by not creating the MSI-X intercept
> > region for pseries guests.
> 
> 
> There were 2 proposals how to do that. Both included platform code to
> decide whether to allow mapping or not  and  some transport to pass that
> enablement flag from the plafform code to the VFIO-PCI driver, one was via
> an IOMMU group flag, the other via a PCI bus flag. Neither was accepted so
> reposting those won't make any progress, what do I miss here? Was there any
> agreement on how to do this?

As we've discussed the filtering of mmap()able regions in the kernel
is not a security concern.  All it accomplishes is poorly trying to
stop userspace from shooting itself in the foot by directly mapping
and accessing the MSI-X table instead of using the VFIO interfaces to
set up MSI vectors.

We should simply remove - unconditionally - the checks in the kernel
which prevent parts of the BARs from being mapped.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux