Re: [PATCH V7 mlx5-next 14/15] vfio/mlx5: Use its own PCI reset_done error handler

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 10 Feb 2022 09:48:11 -0700

On Tue, 8 Feb 2022 22:39:18 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Feb 08, 2022 at 05:08:01PM -0700, Alex Williamson wrote:
> > > @@ -477,10 +499,34 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,
> > >  
> > >  	mutex_lock(&mvdev->state_mutex);
> > >  	*curr_state = mvdev->mig_state;
> > > -	mutex_unlock(&mvdev->state_mutex);
> > > +	mlx5vf_state_mutex_unlock(mvdev);
> > >  	return 0;  
> > 
> > I still can't see why it wouldn't be a both fairly trivial to implement
> > and a usability improvement if the unlock wrapper returned -EAGAIN on a
> > deferred reset so we could avoid returning a stale state to the user
> > and a dead fd in the former case.  Thanks,  
> 
> It simply is not useful - again, we always resolve this race that
> should never happen as though the two events happened consecutively,
> which is what would normally happen if we could use a simple mutex. We
> do not need to add any more complexity to deal with this already
> troublesome thing..

So walk me through how this works with QEMU, it's easy to hand-wave
userspace race and move on, but device reset can be triggered by guest
behavior while migration is supposed to be transparent to the guest.
These are essentially asynchronous threads where we're imposing a
synchronization point or lots of double checking in userspace whether
the device actually entered the state we think it did and if the
returned FD is usable.

Specifically, I suspect we can trigger this race if the VM reboots as
we're initiating a migration in the STOP_COPY phase, but that's maybe
less interesting if we expect the VM to be halted before the device
state is stepped.  More interesting might be how a PRE_COPY transition
works relative to asynchronous VM resets triggering device resets.  Are
we serializing all access to reset vs this DEVICE_FEATURE op or are we
resorting to double checking the device state, and how do we plan to
re-initiate migration states if a VM reset occurs during migration?
Thanks,

Alex