Re: [RFC] /dev/ioasid uAPI proposal

Alex Williamson <alex.williamson@xxxxxxxxxx> · Mon, 7 Jun 2021 13:41:28 -0600

On Mon, 7 Jun 2021 16:08:02 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Mon, Jun 07, 2021 at 12:59:46PM -0600, Alex Williamson wrote:
> 
> > > It is up to qemu if it wants to proceed or not. There is no issue with
> > > allowing the use of no-snoop and blocking wbinvd, other than some
> > > drivers may malfunction. If the user is certain they don't have
> > > malfunctioning drivers then no issue to go ahead.  
> > 
> > A driver that knows how to use the device in a coherent way can
> > certainly proceed, but I suspect that's not something we can ask of
> > QEMU.  QEMU has no visibility to the in-use driver and sketchy ability
> > to virtualize the no-snoop enable bit to prevent non-coherent DMA from
> > the device.  There might be an experimental ("x-" prefixed) QEMU device
> > option to allow user override, but QEMU should disallow the possibility
> > of malfunctioning drivers by default.  If we have devices that probe as
> > supporting no-snoop, but actually can't generate such traffic, we might
> > need a quirk list somewhere.  
> 
> Compatibility is important, but when I look in the kernel code I see
> very few places that call wbinvd(). Basically all DRM for something
> relavent to qemu.
> 
> That tells me that the vast majority of PCI devices do not generate
> no-snoop traffic.

Unfortunately, even just looking at devices across a couple laptops
most devices do support and have NoSnoop+ set by default.  I don't
notice anything in the kernel that actually tries to set this enable (a
handful that actively disable), so I assume it's done by the firmware.
It's not safe for QEMU to make an assumption that only GPUs will
actually make use of it.

> > > I think it makes the software design much simpler if the security
> > > check is very simple. Possessing a suitable device in an ioasid fd
> > > container is enough to flip on the feature and we don't need to track
> > > changes from that point on. We don't need to revoke wbinvd if the
> > > ioasid fd changes, for instance. Better to keep the kernel very simple
> > > in this regard.  
> > 
> > You're suggesting that a user isn't forced to give up wbinvd emulation
> > if they lose access to their device?    
> 
> Sure, why do we need to be stricter? It is the same logic I gave
> earlier, once an attacker process has access to wbinvd an attacker can
> just keep its access indefinitely.
> 
> The main use case for revokation assumes that qemu would be
> compromised after a device is hot-unplugged and you want to block off
> wbinvd. But I have a hard time seeing that as useful enough to justify
> all the complicated code to do it...

It's currently just a matter of the kvm-vfio device holding a reference
to the group so that it cannot be used elsewhere so long as it's being
used to elevate privileges on a given KVM instance.  If we conclude that
access to a device with the right capability is required to gain a
privilege, I don't really see how we can wave aside that the privilege
isn't lost with the device.

> For KVM qemu can turn on/off on hot plug events as it requires to give
> VM security. It doesn't need to rely on the kernel to control this.

Yes, QEMU can reject a hot-unplug event, but then QEMU retains the
privilege that the device grants it.  Releasing the device and
retaining the privileged gained by it seems wrong.  Thanks,

Alex