On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote: > I'm wondering if we're imposing extra requirements on the !_RUNNING > state that don't need to be there. For example, if we can assume that > all devices within a userspace context are !_RUNNING before any of the > devices begin to retrieve final state, then clearing of the _RUNNING > bit becomes the device quiesce point and the beginning of reading > device data is the point at which the device state is frozen and > serialized. No new states required and essentially works with a slight > rearrangement of the callbacks in this series. Why can't we do that? It sounds worth checking carefully. I didn't come up with a major counter scenario. We would need to specifically define which user action triggers the device to freeze and serialize. Reading pending_bytes I suppose? Since freeze is a device command we need to be able to report failure and to move the state to error, that feels bad hidden inside reading pending_bytes. > Maybe a clarification of the uAPI spec is sufficient to achieve this, > ex. !_RUNNING devices may still update their internal state machine > based on external access. Userspace is expected to quiesce all external > access prior to initiating the retrieval of the final device state from > the data section of the migration region. Failure to do so may result > in inconsistent device state or optionally the device driver may induce > a fault if a quiescent state is not maintained. But on the other hand this seem so subtle and tricky way to design a uAPI that devices and users are unlikely to implement it completely correctly. IMHO the point of the state is to control the current behavior of the device - yes we can control behavior on other operations, but it feels obfuscated. Especially when the userspace that needs to know about this isn't even deployed yet, let's just do it cleanly? Jason