On Tue, 2 Nov 2021 12:54:20 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Nov 02, 2021 at 08:56:51AM -0600, Alex Williamson wrote: > > > > Still, this is something that needs clear definition, I would expect > > > the SET_IRQS to happen after resuming clears but before running sets > > > to give maximum HW flexibility and symmetry with saving. > > > > There's no requirement that the device enters a null state (!_RESUMING > > | !_SAVING | !_RUNNING), the uAPI even species the flows as _RESUMING > > transitioning to _RUNNING. > > If the device saves the MSI-X state inside it's migration data (as > apparently would be convenient for other OSs) then when RESUMING > clears and the migration data is de-serialized the device will > overwrite the MSI-X data. > > Since Linux as an OS wants to control the MSI-X it needs to load it > after RESUMING, but before RUNNING. This is not how it works today, QEMU enables MSI/X based on the config space information, which is also outside of the device migration stream. > > There's no point at which we can do SET_IRQS other than in the > > _RESUMING state. Generally SET_IRQS ioctls are coordinated with the > > guest driver based on actions to the device, we can't be mucking > > with IRQs while the device is presumed running and already > > generating interrupt conditions. > > We need to do it in state 000 > > ie resume should go > > 000 -> 100 -> 000 -> 001 > > With SET_IRQS and any other fixing done during the 2nd 000, after the > migration data has been loaded into the device. Again, this is not how QEMU works today. > > > And we should really define clearly what a device is supposed to do > > > with the interrupt vectors during migration. Obviously there are races > > > here. > > > > The device should not be generating interrupts while !_RUNNING, pending > > interrupts should be held until the device is _RUNNING. To me this > > means the sequence must be that INTx/MSI/MSI-X are restored while in > > the !_RUNNING state. > > Yes Except I suppose them to be restored while _RESUMING is set. > > > > In any case, it requires that the device cannot be absolutely static > > > > while !_RUNNING. Does (_RESUMING) have different rules than > > > > (_SAVING)? > > > > > > I'd prever to avoid all device touches during both resuming and > > > saving, and do them during !RUNNING > > > > There's no such state required by the uAPI. > > The uAPI comment does not define when to do the SET_IRQS, it seems > this has been missed. > > We really should fix it, unless you feel strongly that the > experimental API in qemu shouldn't be changed. I think the QEMU implementation fills in some details of how the uAPI is expected to work. MSI/X is expected to be restored while _RESUMING based on the config space of the device, there is no intermediate step between _RESUMING and _RUNNING. Introducing such a requirement precludes the option of a post-copy implementation of (_RESUMING | _RUNNING). Thanks, Alex