Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

"Raj, Ashok" <ashok.raj@xxxxxxxxx> · Fri, 13 Nov 2020 09:38:45 -0800

On Fri, Nov 13, 2020 at 08:12:39AM -0800, Luck, Tony wrote:
> > Of course is this not only an x86 problem. Every architecture which
> > supports virtualization has the same issue. ARM(64) has no way to tell
> > for sure whether the machine runs bare metal either. No idea about the
> > other architectures.
> 
> Sounds like a hypervisor problem. If the VMM provides perfect emulation
> of every weird quirk of h/w, then it is OK to let the guest believe that it is
> running on bare metal.

That's true, which is why there isn't an immutable bit in cpuid or
otherwise telling you are running under a hypervisor. Providing something
like that would make certain features not virtualizable. Apparently before we
had faulting cpuid, what you had in guest was the real raw cpuid. 

Waiver: I'm not saying this is perfect, I'm just replaying the reason
behind it. Not trying to defend it... flames > /dev/null
> 
> If it isn't perfect, then it should make sure the guest knows *for sure*, so that
> the guest can take appropriate actions to avoid the sharp edges.
> 

There are indeed 2 problems to solve.

1. How does device driver know if device is IMS capable.

   IMS is a device attribute. Each vendor can provide its own method to
   provide that indication. One such mechanism is the DVSEC.SIOV.IMS
   property. Some might believe this is for use only by Intel. For DVSEC I
   don't believe there is such a connection as in device vendor id in
   standard header. TBH, there are other device vendors using the exact
   same method to indicate SIOV and IMS propeties. What a DVSEC vendor ID
   states is "As defined by Vendor X". 

   Why we choose a config vs something in device specific mmio is because
   today VFIO being that one common mechanism, it only exposes known
   standard and some extended headers to guest. When we expose a full PF,
   the guest doens't see the DVSEC, so drivers know this isn't available.

   This is our mechanism to stop drivers from calling
   pci_ims_array_create_msi_irq_domain(). It may not be perfect for all
   devices, it is a device specific mechanism. For devices under
   consideration following the SIOV spec it meets the sprit of the
   requirement even without #2 below. When devices have no way to detect
   this, #2 is required as a second way to block IMS.

2. How does platform component (IOMMU) inform if they can support all forms
   of IMS. (On device, or in memory). 

   On device would require some form trap/emulate. Legacy MSIx already has
   that solved, but for device specific store you need some additional
   work.

   When its system memory (say IMS is in GPA space), you need some form of
   hypercall. There is no way around it since we can't intercept. Yes, you
   can maybe map those as RO and trap, but its not pretty.

   To solve this rather than a generic platform capability, maybe we should
   flip this to IOMMU instead, because that's the one that offers this
   capability today.

   iommu_ims_supported() 
   	When platform has no IOMMU or no hypervisor calls, it returns
	false. So device driver can tell, even if it supports IMS
	capability deduction, does the platform support IMS.

        On platforms where iommu supports capability.

	Either there is a vIOMMU with a Virtual Command Register that can
	provide a way to get the interrupt handle similar to what you would
	get from an hypercall for instance. Or there is a real hypercall
	that supports giving the guest OS the physical IRTE handle. 

-- 
Cheers,
Ashok

[Forgiveness is the attribute of the STRONG - Gandhi]