On Tue, 8 Feb 2022 22:39:18 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Feb 08, 2022 at 05:08:01PM -0700, Alex Williamson wrote: > > > @@ -477,10 +499,34 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev, > > > > > > mutex_lock(&mvdev->state_mutex); > > > *curr_state = mvdev->mig_state; > > > - mutex_unlock(&mvdev->state_mutex); > > > + mlx5vf_state_mutex_unlock(mvdev); > > > return 0; > > > > I still can't see why it wouldn't be a both fairly trivial to implement > > and a usability improvement if the unlock wrapper returned -EAGAIN on a > > deferred reset so we could avoid returning a stale state to the user > > and a dead fd in the former case. Thanks, > > It simply is not useful - again, we always resolve this race that > should never happen as though the two events happened consecutively, > which is what would normally happen if we could use a simple mutex. We > do not need to add any more complexity to deal with this already > troublesome thing.. So walk me through how this works with QEMU, it's easy to hand-wave userspace race and move on, but device reset can be triggered by guest behavior while migration is supposed to be transparent to the guest. These are essentially asynchronous threads where we're imposing a synchronization point or lots of double checking in userspace whether the device actually entered the state we think it did and if the returned FD is usable. Specifically, I suspect we can trigger this race if the VM reboots as we're initiating a migration in the STOP_COPY phase, but that's maybe less interesting if we expect the VM to be halted before the device state is stepped. More interesting might be how a PRE_COPY transition works relative to asynchronous VM resets triggering device resets. Are we serializing all access to reset vs this DEVICE_FEATURE op or are we resorting to double checking the device state, and how do we plan to re-initiate migration states if a VM reset occurs during migration? Thanks, Alex