On Wed, Oct 20, 2021 at 11:46:07AM +0300, Yishai Hadas wrote: > What is the expectation for a reasonable delay ? we may expect this system > WQ to run only short tasks and be very responsive. If the expectation is that qemu will see the error return and the turn around and issue FLR followed by another state operation then it does seem strange that there would be a delay. On the other hand, this doesn't seem that useful. If qemu tries to migrate and the device fails then the migration operation is toast and possibly the device is wrecked. It can't really issue a FLR without coordinating with the VM, and it cannot resume the VM as the device is now irrecoverably messed up. If we look at this from a RAS perspective would would be useful here is a way for qemu to request a fail safe migration data. This must always be available and cannot fail. When the failsafe is loaded into the device it would trigger the device's built-in RAS features to co-ordinate with the VM driver and recover. Perhaps qemu would also have to inject an AER or something. Basically instead of the device starting in an "empty ready to use state" it would start in a "failure detected, needs recovery" state. Not hitless, but preserves overall availability vs a failed migration == VM crash. That said, it is just a thought, and I don't know if anyone has put any resources into what to do if migration operations fail right now. But failure is possible, ie the physical device could have crashed and perhaps the migration is to move the VMs off the broken HW. In this scenario all the migration operations will timeout and fail in the driver. However, since the guest VM could issue a FLR at any time, we really shouldn't have this kind of operation floating around in the background. Things must be made deterministic for qemu. eg if qemu gets a guest request for FLR during the pre-copy stage it really should abort the pre-copy, issue the FLR and then restart the migration. I think it is unresonable to ask a device to be able to maintain pre-copy across FLR. To make this work the restarting of the migration must not race with a schedule work wiping out all the state. So, regrettably, something is needed here. Ideally more of this logic would be in shared code, but I'm not sure I have a good feeling what that should look like at this point. Something to attempt once there are a few more implementations. For instance the if predicate ladder I mentioned in the last email should be shared code, not driver core as it is fundamental ABI. Jason