On Wed, 20 Oct 2021 15:59:19 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote: > > > I'm wondering if we're imposing extra requirements on the !_RUNNING > > state that don't need to be there. For example, if we can assume that > > all devices within a userspace context are !_RUNNING before any of the > > devices begin to retrieve final state, then clearing of the _RUNNING > > bit becomes the device quiesce point and the beginning of reading > > device data is the point at which the device state is frozen and > > serialized. No new states required and essentially works with a slight > > rearrangement of the callbacks in this series. Why can't we do that? > > It sounds worth checking carefully. I didn't come up with a major > counter scenario. > > We would need to specifically define which user action triggers the > device to freeze and serialize. Reading pending_bytes I suppose? The first read of pending_bytes after clearing the _RUNNING bit would be the logical place to do this since that's what we define as the start of the cycle for reading the device state. "Freezing" the device is a valid implementation, but I don't think it's strictly required per the uAPI. For instance there's no requirement that pending_bytes is reduced by data_size on each iteratio; we specifically only define that the state is complete when the user reads a pending_bytes value of zero. So a driver could restart the device state if the device continues to change (though it's debatable whether triggering an -errno on the next migration region access might be a more supportable approach to enforce that userspace has quiesced external access). > Since freeze is a device command we need to be able to report failure > and to move the state to error, that feels bad hidden inside reading > pending_bytes. That seems like the wrong model. Reading pending_bytes can return -errno should an error occur during freeze/serialize, but device_state is never implicitly modified. Upon encountering an error reading pending_bytes, userspace would abort the migration and move the device_state to a new value. This is the point at which the driver would return another -errno to indicate an unrecoverable internal error, or honor the requested new state. > > Maybe a clarification of the uAPI spec is sufficient to achieve this, > > ex. !_RUNNING devices may still update their internal state machine > > based on external access. Userspace is expected to quiesce all external > > access prior to initiating the retrieval of the final device state from > > the data section of the migration region. Failure to do so may result > > in inconsistent device state or optionally the device driver may induce > > a fault if a quiescent state is not maintained. > > But on the other hand this seem so subtle and tricky way to design a > uAPI that devices and users are unlikely to implement it completely > correctly. I think it's only tricky if device_state is implicitly modified by other actions, stopping all devices before collecting the state of any of them seems like a reasonable requirement. It's the same requirement we'd add with an NDMA bit. > IMHO the point of the state is to control the current behavior of the > device - yes we can control behavior on other operations, but it feels > obfuscated. It is, don't do that. Fail the migration region operation, fail the next device state transition if the internal error is irrecoverable. > Especially when the userspace that needs to know about this isn't even > deployed yet, let's just do it cleanly? That's an option, but again I'm wondering if we're imposing a requirement on the !_RUNNING state that doesn't really exist and a clean solution exists already. Thanks, Alex