> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Thursday, January 27, 2022 9:11 AM > > On Thu, Jan 27, 2022 at 12:53:54AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Wednesday, January 26, 2022 8:15 PM > > > > > > On Wed, Jan 26, 2022 at 01:49:09AM +0000, Tian, Kevin wrote: > > > > > > > > As STOP_PRI can be defined as halting any new PRIs and always return > > > > > immediately. > > > > > > > > The problem is that on such devices PRIs are continuously triggered > > > > when the driver tries to drain the in-fly requests to enter STOP_P2P > > > > or STOP_COPY. If we simply halt any new PRIs in STOP_PRI, it > > > > essentially implies no migration support for such device. > > > > > > So what can this HW even do? It can't immediately stop and disable its > > > queues? > > > > > > Are you sure it can support migration? > > > > It's a draining model thus cannot immediately stop. Instead it has to > > wait for in-fly requests to be completed (even not talking about vPRI). > > So, it can't complete draining without completing an unknown number of > vPRIs? Right. > > > timeout policy is always in userspace. We just need an interface for the > user > > to communicate it to the kernel. > > Can the HW tell if the draining is completed somehow? Ie can it > trigger and eventfd or something? Yes. Software can specify an interrupt to be triggered when the draining command is completed. > > The v2 API has this nice feature where it can return an FD, so we > could possibly go into a 'stopping PRI' state and that can return an > eventfd for the user to poll on to know when it is OK to move onwards. > > That was the sticking point before, we want completing RUNNING_P2P to > mean the device is halted, but vPRI ideally wants to do a background > halting - now we have a way to do that.. this is nice. > > Returning to running would abort the draining. > > Userspace does the timeout with poll on the event fd.. Yes. > > This also logically justifies why this is not backwards compatabile as > one of the rules in the FSM construction is any arc that can return a > FD must be the final arc. > > So, if the FSM seqeunce is > > RUNNING -> RUNNING_STOP_PRI -> RUNNING_STOP_P2P_AND_PRI -> > STOP_COPY > > Then by the design rules we cannot pass through RUNNING_STOP_PRI > automatically, it must be explicit. > > A cap like "running_p2p returns an event fd, doesn't finish until the > VCPU does stuff, and stops pri as well as p2p" might be all that is > required here (and not an actual new state) > > It is somewhat bizzaro from a wording perspective, but does > potentially allow qemu to be almost unchanged for the two cases.. > let me have more thinking on this part. I need better understanding of existing design rules before concluding agreement here, though it does sound like a good signal. 😊 Thanks Kevin