On Thu, Feb 10, 2022 at 09:48:11AM -0700, Alex Williamson wrote: > Specifically, I suspect we can trigger this race if the VM reboots as > we're initiating a migration in the STOP_COPY phase, but that's maybe > less interesting if we expect the VM to be halted before the device > state is stepped. Yes, STOP_COPY drivers like mlx5/acc are fine here inherently. We have already restricted what device touches are allowed in STOP_COPY, and this must include reset too. None of the two drivers posted can tolerate a reset during the serialization step. mlx5 will fail the STOP_COPY FW command and I guess acc will 'tear' its register reads and produce a corrupted state. > More interesting might be how a PRE_COPY transition works relative > to asynchronous VM resets triggering device resets. Are we > serializing all access to reset vs this DEVICE_FEATURE op or are we > resorting to double checking the device state, and how do we plan to > re-initiate migration states if a VM reset occurs during migration? > Thanks, The device will be in PRE_COPY with VCPUs running. An async reset will be triggered in the guest, so the device returns to RUNNING and the data_fd's immediately return an errno. There are three ways qemu can observe this: 1) it is actively using the data_fds, so it immediately gets an error and propogates it up, aborting the migration eg it is doing read(), poll(), iouring, etc. 2) it is done with the PRE_COPY phase of the data_fd and is moving toward STOP_COPY. In this case the vCPU is halted and the SET_STATE to STOP_COPY will execute, without any race, either: PRE_COPY -> STOP_COPY (data_fd == -1) RUNNING -> STOP_COPY (data_fd != -1) The expected data_fd is detected in the WIP qemu patch, however it mishandles the error, we will fix it. 3) it is aborting the PRE_COPY migration, closing the data_fd and doing SET_STATE to RUNNING. In which case it doesn't know the device was reset. close() succeeds and SET_STATE RUNNING -> RUNNING is a nop. Today's qemu must abort the migration at this point and fully restart it because it has no mechanism to serialize a 'discard all of this device's PRE_COPY state up to here' tag. Some future qemu could learn to do this and then the receiver would discard already sent device state - by triggering reset and a new RUNNING -> RESUMING on the receiving device. In this case qemu would have a choice of: abort the entire migration restart just this device back to PRE_COPY stop the vCPUs and use STOP_COPY In any case, qemu fully detects this race as a natural part of its operations and knows with certainty when it commands to go to STOP_COPY, with vCPUs halted, if the preceeding PRE_COPY state is correct or not. It is interesting you bring this up, I'm not sure this worked properly with v1. It seems we have solved it, inadvertently even, by using the basic logic of the FSM and FD. Jason