Re: [PATCH v10 Kernel 1/5] vfio: KABI for migration interface for device state

Kirti Wankhede <kwankhede@xxxxxxxxxx> · Tue, 7 Jan 2020 12:58:22 +0530

On 1/7/2020 4:48 AM, Alex Williamson wrote:
On Thu, 2 Jan 2020 18:25:37 +0000
"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote:

* Alex Williamson (alex.williamson@xxxxxxxxxx) wrote:
On Fri, 20 Dec 2019 01:40:35 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

On 12/19/2019 10:57 PM, Alex Williamson wrote:

<Snip>

<snip>

If device state it at pre-copy state (011b).
Transition, i.e., write to device state as stop-and-copy state (010b)
failed, then by previous state I meant device should return pre-copy
state(011b), i.e. previous state which was successfully set, or as you
said current state which was successfully set.

Yes, the point I'm trying to make is that this version of the spec
tries to tell the user what they should do upon error according to our
current interpretation of the QEMU migration protocol.  We're not
defining the QEMU migration protocol, we're defining something that can
be used in a way to support that protocol.  So I think we should be
concerned with defining our spec, for example my proposal would be: "If
a state transition fails the user can read device_state to determine the
current state of the device.  This should be the previous state of the
device unless the vendor driver has encountered an internal error, in
which case the device may report the invalid device_state 110b.  The
user must use the device reset ioctl in order to recover the device
from this state.  If the device is indicated in a valid device state
via reading device_state, the user may attempt to transition the device
to any valid state reachable from the current state."

We might want to be able to distinguish between:
   a) The device has failed and needs a reset
   b) The migration has failed

I think the above provides this.  For Kirti's example above of
transitioning from pre-copy to stop-and-copy, the device could refuse
to transition to stop-and-copy, generating an error on the write() of
device_state.  The user re-reading device_state would allow them to
determine the current device state, still in pre-copy or failed.  Only
the latter would require a device reset.

If some part of the devices mechanics for migration fail, but the device
is otherwise operational then we should be able to decide to fail the
migration without taking the device down, which might be very bad for
the VM.
Losing a VM during migration due to a problem with migration really
annoys users; it's one thing the migration failing, but taking the VM
out as well really gets to them.

Having the device automatically transition back to the 'running' state
seems a bad idea to me; much better to tell the hypervisor and provide
it with a way to clean up; for example, imagine a system with multiple
devices that are being migrated, most of them have happily transitioned
to stop-and-copy, but then the last device decides to fail - so now
someone is going to have to take all of them back to running.

Right, unless I'm missing one, it seems invalid->running is the only
self transition the device should make, though still by way of user
interaction via the reset ioctl.  Thanks,

Instead of using invalid state by vendor driver on device failure, I 
think better to reserve one bit in device state which vendor driver can 
set on device failure. When error bit is set, other bits in device state 
should be ignored.

Thanks,
Kirti