Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ashok,

On Mon, Nov 09 2020 at 21:14, Ashok Raj wrote:
> On Mon, Nov 09, 2020 at 11:42:29PM +0100, Thomas Gleixner wrote:
>> On Mon, Nov 09 2020 at 13:30, Jason Gunthorpe wrote:
> Approach to IMS is more of a phased approach. 
>
> #1 Allow physical device to scale beyond limits of PCIe MSIx
>    Follows current methodology for guest interrupt programming and
>    evolutionary changes rather than drastic.

Trapping MSI[X] writes is there because it allows to hand a device to an
unmodified guest OS and to handle the case where the MSI[X] entries
storage cannot be mapped exclusively to the guest.

But aside of this, it's not required if the storage can be mapped
exclusively, the guest is hypervisor aware and can get a host composed
message via a hypercall. That works for physical functions and SRIOV,
but not for SIOV.

> #2 Long term we should work together on enabling IMS in guest which
>    requires changes in both HW and SW eco-system.
>
> For #1, the immediate need is to find a way to limit guest from using IMS
> due to current limitations. We have couple options.
>
> a) CPUID based method to disallow IMS when running in a guest OS. Limiting
>    use to existing virtual MSIx to guest devices. (Both you/Jason alluded)
> b) We can extend DMAR table to have a flag for opt-out. So in real platform
>    this flag is clear and in guest VMM will ensure vDMAR will have this flag
>    set. Along the lines as Jason alluded, platform level and via ACPI
>    methods. We have similar use for x2apic_optout today.
>
> Think a) is probably more generic.

But incomplete as I explained before. If the VMM does not set the
hypervisor bit in CPUID then the guest OS assumes to run on bare
metal. It needs more than just relying on CPUID.

Aside of that neither Jason nor myself said that IMS cannot be supported
in a guest. PF and VF IMS can and has to be supported. SIOV is a
different story due to the PASID requirement which obviously needs to be
managed host side and needs HW changes.

> From SW improvements:
>
> - Hypercall to retrieve addr/data from host

You need to have that even for the non SIOV case in order to hand in a
full device which has the IMS storage in queue memory.

> Devices such as idxd that do not have these entries on page-boundaries for
> isolation to permit direct programming from GuestOS will continue to use
> trap-emulate as used today.

That's a restriction of that particular hardware.

> Until then, IMS will be restricted to host VMM only, and we can use the
> methods above to prevent IMS in guest and continue to use the legacy
> virtual MSIx.

SIOV IMS.

But as things stand now not even PF/VF pass through are possible. This
might not be an issue for IDXD, but it's an issue in general and this
want's the be thought of _now_ before we put a lot of infrastructure in
to place which needs then to be ripped apart again.

>> The current specification puts massive restrictions on IMS storage which
>> are _not_ allowing to optimize it in a device specific manner as
>> demonstrated in this discussion.
>
> IMS doesn't restrict this optimization, but to allow it requires more
> OS support as you had mentioned.

Right, IMS per se does not put an restriction on it.

The specification and the HW limitations on the remapping unit put that
restriction into place.

OS support is an obvious requirement, but OS support cannot make
the restrictions of HW go away magically.

But again, we need to think about the path forward _now_.

Just slapping some 'works for IDXD' solution into place can severly
restrict the options for going beyond these limitations simply because
we have to support that 'works for IDXD thing' forever.

Thanks,

        tglx



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux