On Wed, Oct 20 2021, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Wed, 20 Oct 2021 15:59:19 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >> On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote: >> >> > I'm wondering if we're imposing extra requirements on the !_RUNNING >> > state that don't need to be there. For example, if we can assume that >> > all devices within a userspace context are !_RUNNING before any of the >> > devices begin to retrieve final state, then clearing of the _RUNNING >> > bit becomes the device quiesce point and the beginning of reading >> > device data is the point at which the device state is frozen and >> > serialized. No new states required and essentially works with a slight >> > rearrangement of the callbacks in this series. Why can't we do that? >> >> It sounds worth checking carefully. I didn't come up with a major >> counter scenario. >> >> We would need to specifically define which user action triggers the >> device to freeze and serialize. Reading pending_bytes I suppose? > > The first read of pending_bytes after clearing the _RUNNING bit would > be the logical place to do this since that's what we define as the start > of the cycle for reading the device state. > > "Freezing" the device is a valid implementation, but I don't think it's > strictly required per the uAPI. For instance there's no requirement > that pending_bytes is reduced by data_size on each iteratio; we > specifically only define that the state is complete when the user reads > a pending_bytes value of zero. So a driver could restart the device > state if the device continues to change (though it's debatable whether > triggering an -errno on the next migration region access might be a > more supportable approach to enforce that userspace has quiesced > external access). Hm, not so sure. From my reading of the uAPI, transitioning from pre-copy to stop-and-copy (i.e. clearing _RUNNING) implies that we freeze the device (at least, that's how I interpret "On state transition from pre-copy to stop-and-copy, the driver must stop the device, save the device state and send it to the user application through the migration region.") Maybe the uAPI is simply not yet clear enough.