Re: [RFC PATCH kernel] vfio-pci: Allow mapping MSIX BAR

Alexey Kardashevskiy <aik@xxxxxxxxx> · Wed, 29 Nov 2017 14:25:43 +1100

On 22/11/17 17:51, David Gibson wrote:
> On Tue, Nov 21, 2017 at 10:14:45PM -0700, Alex Williamson wrote:
>> On Wed, 22 Nov 2017 15:44:55 +1100
>> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> On Tue, Nov 21, 2017 at 09:28:46PM -0700, Alex Williamson wrote:
>>>> On Wed, 22 Nov 2017 15:09:32 +1100
>>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
>>>>   
>>>>> By default VFIO disables mapping of MSIX BAR to the userspace as
>>>>> the userspace may program it in a way allowing spurious interrupts;
>>>>> instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
>>>>>
>>>>> This works fine as long as the system page size equals to the MSIX
>>>>> alignment requirement which is 4KB. However with a bigger page size
>>>>> the existing code prohibits mapping non-MSIX parts of a page with MSIX
>>>>> structures so these parts have to be emulated via slow reads/writes on
>>>>> a VFIO device fd. If these emulated bits are accessed often, this has
>>>>> serious impact on performance.
>>>>>
>>>>> This adds an ioctl to the vfio-pci device which hides the sparse
>>>>> capability and allows the userspace to map a BAR with MSIX structures.  
>>>>
>>>> So the user is in control of telling the kernel whether they're allowed
>>>> to mmap the msi-x vector table.  That makes absolutely no sense.  If
>>>> you're trying to figure out how userspace knows whether to implicitly
>>>> avoid mmap'ing the msix region, I think there are far better ways in
>>>> the existing region info ioctl.  We could use a flag, or maybe the
>>>> existence of a capability chain pointer, or a new capability.  But
>>>> absolutely not this.  The kernel needs to decide whether it's going to
>>>> let the user do this, not the user.  Thanks,  
>>>
>>> No, it doesn't.  This is actually the approach we discussed in Prague.
>>>
>>> Remember that intercepting access to the MSI-X table is not a host
>>> safety / security issue.  It's just that without that we won't wire up
>>> the device's MSI-X vectors properly so they won't work.
>>>
>>> Basically the decision here is between
>>>
>>>    A) Allow MSI-X configuration via standard PCI mechanisms, at the
>>>       cost of making access slow for any registers sharing a page with
>>>       the MSI-X table.
>>>
>>> or
>>>
>>>    B) Make access to BAR registers sharing a page with the MSI-X table
>>>       fast, at the cost of requiring some alternative mechanism to
>>>       configure MSI-X vectors.
>>>
>>> And that is a tradeoff that it is reasonable for userspace to make.
>>>
>>> In the case of KVM guests, the decision depends entirely on the
>>> *guest* platform.  Usually we need (A) because the guest expects to be
>>> able to poke the MSI-X table in the usual way.  However for PAPR
>>> guests, there's an alternative mechanism via an RTAS call, which means
>>> we can use (B).
>>>
>>> The host kernel can't make this decision, because it doesn't know the
>>> guest platform (well, KVM might, but VFIO doesn't).
>>>
>>> A userspace VFIO program could also elect for (B) if it does care
>>> about performance of access to registers in the same BAR as the MSI-X
>>> table, but doesn't need MSI-X for example.
>>
>> You're asking for an ioctl to allow the kernel to allow the user to
>> mmap the page, when instead we could just allow the user to mmap the
>> page and whether the user does that and how they make use of it is up
>> to them...
> 
> Duh.  Sorry.  For some reason I was thinking the magic MSI-X
> interception was happening in the host kernel rather than in qemu.
> 
>> I understand that there are different virtualization techniques at play
>> here, it just doesn't seem relevant.  In the case of (A), the user can
>> choose not to mmap the page overlapping the vector table even if the
>> kernel allows it.  The user can also choose to mmap that page, but not
>> use the portion overlapping the vector table.  QEMU already does this
>> by overlaying a MemoryRegion for vector table emulation.  We might even
>> be able to get away with mmaping that page and emulating the vector
>> table elsewhere, which seems like the only option for a 64k page ARM
>> system.  For (B), clearly it's just a nuisance that we can't currently
>> mmap this page, but I still don't see how the user allowing the kernel
>> to allow the user to mmap that page makes any sense.  I can't even
>> describe it without it sounding ridiculous.  Thanks,
> 
> Right.  Rethinking..  it seems to me we should just completely remove
> the logic from the kernel banning mmap()s overlapping the MSI-X
> table.  All it does is poorly attempt to stop the user shooting
> themselves in the foot.
> 
> Then we just need logic in qemu to avoid doing the overlapping memory
> region nonsense on a per-machine basis

So is there still any plan or we just ditch the feature? I am confused now.

-- 
Alexey

Attachment:
signature.asc

Description: OpenPGP digital signature