On Wed, 13 Nov 2019 11:24:17 +0100 Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > On Tue, 12 Nov 2019 15:30:05 -0700 > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > On Tue, 12 Nov 2019 22:33:36 +0530 > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > > > - Defined MIGRATION region type and sub-type. > > > - Used 3 bits to define VFIO device states. > > > Bit 0 => _RUNNING > > > Bit 1 => _SAVING > > > Bit 2 => _RESUMING > > > Combination of these bits defines VFIO device's state during migration > > > _RUNNING => Normal VFIO device running state. When its reset, it > > > indicates _STOPPED state. when device is changed to > > > _STOPPED, driver should stop device before write() > > > returns. > > > _SAVING | _RUNNING => vCPUs are running, VFIO device is running but > > > start saving state of device i.e. pre-copy state > > > _SAVING => vCPUs are stopped, VFIO device should be stopped, and > > > > s/should/must/ > > > > > save device state,i.e. stop-n-copy state > > > _RESUMING => VFIO device resuming state. > > > _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states > > > > A table might be useful here and in the uapi header to indicate valid > > states: > > I like that. > > > > > | _RESUMING | _SAVING | _RUNNING | Description > > +-----------+---------+----------+------------------------------------------ > > | 0 | 0 | 0 | Stopped, not saving or resuming (a) > > +-----------+---------+----------+------------------------------------------ > > | 0 | 0 | 1 | Running, default state > > +-----------+---------+----------+------------------------------------------ > > | 0 | 1 | 0 | Stopped, migration interface in save mode > > +-----------+---------+----------+------------------------------------------ > > | 0 | 1 | 1 | Running, save mode interface, iterative > > +-----------+---------+----------+------------------------------------------ > > | 1 | 0 | 0 | Stopped, migration resume interface active > > +-----------+---------+----------+------------------------------------------ > > | 1 | 0 | 1 | Invalid (b) > > +-----------+---------+----------+------------------------------------------ > > | 1 | 1 | 0 | Invalid (c) > > +-----------+---------+----------+------------------------------------------ > > | 1 | 1 | 1 | Invalid (d) > > > > I think we need to consider whether we define (a) as generally > > available, for instance we might want to use it for diagnostics or a > > fatal error condition outside of migration. > > > > Are there hidden assumptions between state transitions here or are > > there specific next possible state diagrams that we need to include as > > well? > > Some kind of state-change diagram might be useful in addition to the > textual description anyway. Let me try, just to make sure I understand > this correctly: > > 1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1 > 2) 0/0/1 ---(tell driver to stop)---> 0/0/0 > 3) 0/1/1 ---(tell driver to stop)---> 0/1/0 > 4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0 I think this is to switch into resuming mode, the data will follow > 5) 1/0/0 ---(driver is ready)---> 0/0/1 > 6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1 I think also: 0/0/1 --> 0/1/0 If user chooses to go directly to stop and copy 0/0/0 and 0/0/1 should be reachable from any state, though I could see that a vendor driver could fail transition from 1/0/0 -> 0/0/1 if the received state is incomplete. Somehow though a user always needs to return the device to the initial state, so how does device_state interact with the reset ioctl? Would this automatically manipulate device_state back to 0/0/1? > Not sure about the usefulness of 2). Also, is 4) the only way to > trigger resuming? And is the change in 5) performed by the driver, or > by userspace? > > Are any other state transitions valid? > > (...) > > > > + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase > > > + * and for _SAVING device state or stop-and-copy phase: > > > + * a. read pending_bytes. If pending_bytes > 0, go through below steps. > > > + * b. read data_offset, indicates kernel driver to write data to staging buffer. > > > + * Kernel driver should return this read operation only after writing data to > > > + * staging buffer is done. > > > > "staging buffer" implies a vendor driver implementation, perhaps we > > could just state that data is available from (region + data_offset) to > > (region + data_offset + data_size) upon return of this read operation. > > > > > + * c. read data_size, amount of data in bytes written by vendor driver in > > > + * migration region. > > > + * d. read data_size bytes of data from data_offset in the migration region. > > > + * e. process data. > > > + * f. Loop through a to e. Next read on pending_bytes indicates that read data > > > + * operation from migration region for previous iteration is done. > > > > I think this indicate that step (f) should be to read pending_bytes, the > > read sequence is not complete until this step. Optionally the user can > > then proceed to step (b). There are no read side-effects of (a) afaict. > > > > Is the use required to reach pending_bytes == 0 before changing > > device_state, particularly transitioning to !_RUNNING? Presumably the > > user can exit this sequence at any time by clearing _SAVING. > > That would be transition 6) above (abort saving and continue). I think > it makes sense not to forbid this. > > > > > > + * > > > + * Sequence to be followed while _RESUMING device state: > > > + * While data for this device is available, repeat below steps: > > > + * a. read data_offset from where user application should write data. > > > + * b. write data of data_size to migration region from data_offset. > > > + * c. write data_size which indicates vendor driver that data is written in > > > + * staging buffer. Vendor driver should read this data from migration > > > + * region and resume device's state. > > > > The device defaults to _RUNNING state, so a prerequisite is to set > > _RESUMING and clear _RUNNING, right? > > Transition 4) above. Do we need > 7) 0/0/0 ---(tell driver to resume with provided info)---> 1/0/0 > as well? (Probably depends on how sensible the 0/0/0 state is.) I think we must unless we require the user to transition from 0/0/1 to 1/0/0 in a single operation, but I'd prefer to make 0/0/0 generally available. Thanks, Alex