On Thu, Sep 30, 2021 at 07:51:22PM +0300, Max Gurtovoy wrote: > > On 9/30/2021 7:24 PM, Jason Gunthorpe wrote: > > On Thu, Sep 30, 2021 at 06:32:07PM +0300, Max Gurtovoy wrote: > > > > Just prior to open device the vfio pci layer will generate a FLR to > > > > the function so we expect that post open_device has a fresh from reset > > > > fully running device state. > > > running also mean that the device doesn't have a clue on its internal state > > > ? or running means unfreezed and unquiesced ? > > The device just got FLR'd and it should be in a clean state and > > operating. Think the VM is booting for the first time. > > During the resume phase in the dst, the VM is paused and not booting. > Migration SW is waiting to get memory and state from SRC. The device will > start from the exact point that was in the src. > > it's exactly "000b => Device Stopped, not saving or resuming" For this case qmeu should open the VFIO device and immediately issue a command to go to resuming. The kernel cannot know at open_device time which case userspace is trying to do. Due to backwards compat we assume userspace is going to boot a fresh VM. > Well, this is your design for the driver implementation. Nobody is > preventing other drivers to start deserializing device state into the device > during RESUMING bit on. It is a logical model. Devices can stream the migration data directly into the internal state if they like. It just creates more conditions where they have report an error state. > So if we moved from 100b to 010b somehow, one should deserialized its buffer > to the device, and then serialize it to migration region again ? Yes. > I guess its doable since the device is freeze and quiesced. But moving from > 100b to 011b is not possible, right ? Why not? 100b to 011b is no different than going indirectly 100b -> 001b -> 011b The time spent in 001b is just negligable. Jason