RE: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Wed, 5 Jan 2022 03:06:37 +0000

> From: Tian, Kevin
> Sent: Wednesday, January 5, 2022 9:59 AM
> 
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, January 5, 2022 12:10 AM
> >
> > On Tue, Jan 04, 2022 at 03:49:07AM +0000, Tian, Kevin wrote:
> >
> > > btw can you elaborate the DOS concern? The device is assigned
> > > to an user application, which has one thread (migration thread)
> > > blocked on another thread (vcpu thread) when transiting the
> > > device to NDMA state. What service outside of this application
> > > is denied here?
> >
> > The problem is the VM controls when the vPRI is responded and
> > migration cannot proceed until this is done.
> >
> > So the basic DOS is for a hostile VM to trigger a vPRI and then never
> > answer it. Even trivially done from userspace with a vSVA and
> > userfaultfd, for instance.
> >
> > This will block the hypervisor from ever migrating the VM in a very
> > poor way - it will just hang in the middle of a migration request.
> 
> it's poor but 'hang' won't happen. PCI spec defines completion timeout
> for ATS translation request. If timeout the device will abort the in-fly
> request and report error back to software.
> 
> >
> > Regardless of the complaints of the IP designers, this is a very poor
> > direction.
> >
> > Progress in the hypervisor should never be contingent on a guest VM.
> >
> 
> Whether the said DOS is a real concern and how severe it is are usage
> specific things. Why would we want to hardcode such restriction on
> an uAPI? Just give the choice to the admin (as long as this restriction is
> clearly communicated to userspace clearly)...
> 
> IMHO encouraging IP designers to work out better hardware shouldn't
> block supporting current hardware which has limitations but also values
> in scenarios where those limitations are tolerable.
> 

btw although the uapi is named 'migration', it's really about device
state management. Whether the managed device state is further 
migrated and whether failure to migrate is severe are really not 
the kernel's business.

It's just simple that changing device state could fail. and vPRI here is
just one failure reason due to no response from the user after certain 
timeout (for a user-managed page table).

Then it's Qemu which should document the restriction and provide
options for the admin to decide whether to expose vPRI vs. migration
based on specific usage requirement. The choices could be vPRI-off/
migration-on, vPRI-on/migration-off, or enabling both (migration
failure is tolerable or no 'hostile' VM in the setup)...

Thanks
Kevin