RE: [PATCH RFC] vfio: Revise and update the migration uAPI description

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Thu, 27 Jan 2022 00:53:54 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Wednesday, January 26, 2022 8:15 PM
> 
> On Wed, Jan 26, 2022 at 01:49:09AM +0000, Tian, Kevin wrote:
> 
> > > As STOP_PRI can be defined as halting any new PRIs and always return
> > > immediately.
> >
> > The problem is that on such devices PRIs are continuously triggered
> > when the driver tries to drain the in-fly requests to enter STOP_P2P
> > or STOP_COPY. If we simply halt any new PRIs in STOP_PRI, it
> > essentially implies no migration support for such device.
> 
> So what can this HW even do? It can't immediately stop and disable its
> queues?
> 
> Are you sure it can support migration?

It's a draining model thus cannot immediately stop. Instead it has to
wait for in-fly requests to be completed (even not talking about vPRI).

It cannot support mandatory migration, but definitely can support
optional migration per our earlier discussions. Due to unbound time
of completing in-fly requests there is higher likelihood of breaking SLA.
For this case having a way allowing user to specify a timeout would 
be beneficial even for the base arc <RUNNING -> STOP>

> 
> > > STOP_P2P can hang if PRI's are open
> >
> > In earlier discussions we agreed on a timeout mechanism to avoid such
> > hang issue.
> 
> It is very ugly, ideally I'd prefer the userspace to handle the
> timeout policy..

timeout policy is always in userspace. We just need an interface for the user
to communicate it to the kernel. 

Thanks
Kevin