Re: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state

Cornelia Huck <cohuck@xxxxxxxxxx> · Tue, 14 Dec 2021 13:08:31 +0100

On Mon, Dec 13 2021, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> We do specify that a migration driver has discretion in using the error
> state for failed transitions, so there are options to simplify error
> handling.
>
> If we look at bit flips, we have:
>
> Initial state
> |  Resuming
> |  |  Saving
> |  |  |  Running
> |  |  |  |  Next states with multiple bit flips
>
> a) 0, 0, 0  (d)
> b) 0, 0, 1  (c) (e)
> c) 0, 1, 0  (b) (e)
> d) 0, 1, 1  (a) (e)
> e) 1, 0, 0  (b) (c) (d)
> f) 1, 0, 1 UNSUPPORTED
> g) 1, 1, 0 ERROR
> h) 1, 1, 1 INVALID
>
> We specify that we cannot pass through any invalid states during
> transition, so if I consider each bit to be a discrete operation and
> map all these multi-bit changes to a sequence of single bit flips, the
> only requirements are:
>
>   1) RESUMING must be cleared before setting SAVING or RUNNING
>   2) SAVING or RUNNING must be cleared before setting RESUMING
>
> I think the basis of your priority scheme comes from that.  Ordering
> of the remaining items is more subtle though, for instance
> 0 -> SAVING | RUNNING can be broken down as:
>
>   0 -> SAVING
>   SAVING -> SAVING | RUNNING 
>
>   vs
>
>   0 -> RUNNING
>   RUNNING -> SAVING | RUNNING
>
> I'd give preference to enabling logging before running and I believe
> that holds for transition (e) -> (d) as well.

Agreed.

>
> In the reverse case, SAVING | RUNNING -> 0
>
>   SAVING | RUNNING -> RUNNING
>   RUNNING -> 0
>
>   vs
>
>   SAVING | RUNNING -> SAVING
>   SAVING -> 0
>
> This one is more arbitrary.  I tend to favor clearing RUNNING to stop
> the device first, largely because it creates nice symmetry in the
> resulting algorithm and follows the general principle that you
> discovered as well, converge towards zero by addressing bit clearing
> before setting.

That also makes sense to me.

> So a valid algorithm would include:
>
> int set_device_state(u32 old, u32 new, bool is_unwind)
> {
> 	if (old.RUNNING && !new.RUNNING) {
> 		curr.RUNNING = 0;
> 		if (ERROR) goto unwind;
> 	}
> 	if (old.SAVING && !new.SAVING) {
> 		curr.SAVING = 0;
> 		if (ERROR) goto unwind;
> 	}
> 	if (old.RESUMING && !new.RESUMING) {
> 		curr.RESUMING = 0;
> 		if (ERROR) goto unwind;
> 	}
> 	if (!old.RESUMING && new.RESUMING) {
> 		curr.RESUMING = 1;
> 		if (ERROR) goto unwind;
> 	}
> 	if (!old.SAVING && new.SAVING) {
> 		curr.saving = 1;
> 		if (ERROR) goto unwind;
> 	}
> 	if (!old.RUNNING && new.RUNNING) {
> 		curr.RUNNING = 1;
> 		if (ERROR) goto unwind;
>         }
>
> 	return 0;
>
> unwind:
> 	if (!is_unwind) {
> 		ret = set_device_state(curr, old, true);
> 		if (ret) {
> 			curr.raw = ERROR;
> 			return ret;
> 		}
> 	}
>
> 	return -EIO;
> }
>

I've stared at this and scribbled a bit on paper and I think this would
work.

> And I think that also addresses the claim that we're doomed to untested
> and complicated error code handling, we unwind by simply swapping the
> args to our set state function and enter the ERROR state should that
> recursive call fail.

Nod. As we clear RUNNING as the first thing and would only set RUNNING
again as the last step, any transition to ERROR should be save.

>
> I think it would be reasonable for documentation to present similar
> pseudo code as a recommendation, but ultimately the migration driver
> needs to come up with something that fits all the requirements.

Yes, something like this should go into Documentation/.