Re: [PATCH v15 1/1] vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 3 Jan 2024 20:40:18 -0400

On Wed, Jan 03, 2024 at 05:24:26PM -0700, Alex Williamson wrote:
> > Why do it need to do anything special? If the VM read/writes from
> > memory that the master bit is disabled on it gets undefined
> > behavior. The system doesn't crash and it does something reasonable.
> 
> The behavior is actually defined (6.0.1 Table 7-4):
> 
>     Memory Space Enable - Controls a Function's response to Memory
>     Space accesses. When this bit is Clear, all received Memory Space
>     accesses are caused to be handled as Unsupported Requests. When
>     this bit is Set, the Function is enabled to decode the address and
>     further process Memory Space accesses.
> 
> From there we get into system error handling decisions where some
> platforms claim to protect data integrity by generating a fault before
> allowing drivers to consume the UR response and others are more lenient.

Sure PCIe defines more detail, but the actual behavior the SW
experiences when triggering this corner is effective undefined as
"machine crash" is something that actually happens.

> AIUI, the address space enable bits are primarily to prevent the device
> from decoding accesses during BAR sizing operations or prior to BAR
> programming.  

Yes. It is not functionally relavent to devices like this that have a
fixed aperture, or to virtual devices that can't move the physical
aperture.

I think the layers have become confused a bit here. The vfio side
should entirely care about kernel self-protection from hostile
userspace, which is why we have to zap/etc.

However the VMM still controls the "address decoder" and if the memory
(or IO) enable is off then the VMM should already prevent the VM
address space from decoding into the VFIO regions at all. Ie it should
unmap it from KVM for mmapable regions, and stop matching the address
for emulated regions.

This is effectively necessary because the VM might choose to reprogram
the BAR registers and move the region, it can't do this atomically so
we have to fully ignore the BAR value when the decoders are disabled.

IOW the corner case of the memory enable disable and the VM touching
the memory is not something the kernel VFIO should be emulating, and
indeed, I think there is probably no reason to allow the VM to
manipulate the physical control either..

> unprogrammed BARs are ignored (ie. not exposed to userspace), so perhaps
> as long as it can be guaranteed that an access with the address space
> enable bit cleared cannot generate a system level fault, we actually
> have no strong argument to strictly enforce the address space bits.

This is what I think, yes.

> > I think that has just become too pedantic, accessing the regions with
> > the enable bits turned off is broadly undefined behavior. So long as
> > the platform doesn't crash, I think it is fine to behave in a simple
> > way.
> > 
> > There is no use case for providing stronger emulation of this.
> 
> As above, I think I can be convinced this is acceptable given that the
> platform and device are essentially one in the same here with
> understood lack of a system wide error response.

Right

> Now I'm wondering if we should do something different with
> virtio-vfio-pci.  As a VF, the memory space is effectively always
> enabled, governed by the SR-IOV MSE bit on the PF which is assumed to
> be static.  It doesn't make a lot of sense to track the IO enable bit
> for the emulated IO BAR when the memory BAR is always enabled.  It's a
> fairly trivial amount of code though, so it's not harmful either.

As above, it was probably unneeded to put this into VFIO kernel side,
I don't think there is a functional harm to allow it.

Jason