On Tue, Jan 11, 2022 at 03:14:04AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Tuesday, January 11, 2022 2:12 AM > > > > On Mon, Jan 10, 2022 at 07:55:16AM +0000, Tian, Kevin wrote: > > > > > > > {SAVING} -> {RESUMING} > > > > > If not supported, user can achieve this via: > > > > > {SAVING}->{RUNNING}->{RESUMING} > > > > > {SAVING}-RESET->{RUNNING}->{RESUMING} > > > > > > > > This can be: > > > > > > > > SAVING -> STOP -> RESUMING > > > > > > From Alex's original description the default device state is RUNNING. > > > This supposed to be the initial state on the dest machine for the > > > device assigned to Qemu before Qemu resumes the device state. > > > Then how do we eliminate the RUNNING state in above flow? Who > > > makes STOP as the initial state on the dest node? > > > > All of this notation should be read with the idea that the > > device_state is already somehow moved away from RESET. Ie the above > > notation is about what is possible once qemu has already moved the > > device to SAVING. > > Qemu moves the device to SAVING on the src node. > > On the dest the device is in RUNNING (after reset) which can be directly > transitioned to RESUMING. I didn't see the point of adding a STOP here. Alex is talking about the same node case where qemu has put the device into SAVING and then, for whatever reason, decides it now wants the device to be in RESUMING. We are talking about the state space of commands the driver has to process here. If we can break down things like SAVING -> RESUMING into two commands: SAVING -> STOP STOP -> RESUMING Then the driver has to implement fewer arcs, and the arcs it does implement are much simpler. It also resolves the precedence question nicely as we have a core FSM that is built on the arcs the drivers implement and that in turn gives a natural answer to the question of how do you transit between any two states. Eg using the state names I gave earlier we can look at going from RESUMING -> PRE_COPY_NDMA and decomposing it into these four steps: RESUMING -> STOP -> RUNNING -> PRE_COPY -> PRE_COPY_P2P In the end the driver needs to implement only about half of the total arcs and the ones it does need to implement are simpler and have a more obvious implementation. > Later when supporting hw mdev (with pasid granular isolation in > iommu), this restriction can be uplifted as it doesn't use dma api > and is pretty much like a pdev regarding to ioas management. When I say 'mdev' I really mean things that use the vfio pinning interface - which we don't quite have a proper name for yet (though emulated iommu perhaps is sticking) Things that use iommu_domain would not be a problem Jason