On Sun, 20 Feb 2022 11:57:10 +0200 Yishai Hadas <yishaih@xxxxxxxxxx> wrote: > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Replace the existing region based migration protocol with an ioctl based > protocol. The two protocols have the same general semantic behaviors, but > the way the data is transported is changed. > > This is the STOP_COPY portion of the new protocol, it defines the 5 states > for basic stop and copy migration and the protocol to move the migration > data in/out of the kernel. > > Compared to the clarification of the v1 protocol Alex proposed: > > https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen > > This has a few deliberate functional differences: > > - ERROR arcs allow the device function to remain unchanged. > > - The protocol is not required to return to the original state on > transition failure. Instead userspace can execute an unwind back to > the original state, reset, or do something else without needing kernel > support. This simplifies the kernel design and should userspace choose > a policy like always reset, avoids doing useless work in the kernel > on error handling paths. > > - PRE_COPY is made optional, userspace must discover it before using it. > This reflects the fact that the majority of drivers we are aware of > right now will not implement PRE_COPY. > > - segmentation is not part of the data stream protocol, the receiver > does not have to reproduce the framing boundaries. I'm not sure how to reconcile the statement above with: "The user must consider the migration data segments carried over the FD to be opaque and non-fungible. During RESUMING, the data segments must be written in the same order they came out of the saving side FD." This is subtly conflicting that it's not segmented, but segments must be written in order. We'll naturally have some segmentation due to buffering in kernel and userspace, but I think referring to it as a stream suggests that the user can cut and join segments arbitrarily so long as byte order is preserved, right? I suspect the commit log comment is referring to the driver imposed segmentation and framing relative to region offsets. Maybe something like: "The user must consider the migration data stream carried over the FD to be opaque and must preserve the byte order of the stream. The user is not required to preserve buffer segmentation when writing the data stream during the RESUMING operation." This statement also gives me pause relative to Jason's comments regarding async support: > + * The kernel migration driver must fully transition the device to the new state > + * value before the operation returns to the user. The above statement certainly doesn't preclude asynchronous availability of data on the stream FD, but it does demand that the device state transition itself is synchronous and can cannot be shortcut. If the state transition itself exceeds migration SLAs, we're in a pickle. Thanks, Alex