Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 20 2021, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> On Wed, 20 Oct 2021 15:59:19 -0300
> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
>> On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote:
>> 
>> > I'm wondering if we're imposing extra requirements on the !_RUNNING
>> > state that don't need to be there.  For example, if we can assume that
>> > all devices within a userspace context are !_RUNNING before any of the
>> > devices begin to retrieve final state, then clearing of the _RUNNING
>> > bit becomes the device quiesce point and the beginning of reading
>> > device data is the point at which the device state is frozen and
>> > serialized.  No new states required and essentially works with a slight
>> > rearrangement of the callbacks in this series.  Why can't we do that?  
>> 
>> It sounds worth checking carefully. I didn't come up with a major
>> counter scenario.
>> 
>> We would need to specifically define which user action triggers the
>> device to freeze and serialize. Reading pending_bytes I suppose?
>
> The first read of pending_bytes after clearing the _RUNNING bit would
> be the logical place to do this since that's what we define as the start
> of the cycle for reading the device state.
>
> "Freezing" the device is a valid implementation, but I don't think it's
> strictly required per the uAPI.  For instance there's no requirement
> that pending_bytes is reduced by data_size on each iteratio; we
> specifically only define that the state is complete when the user reads
> a pending_bytes value of zero.  So a driver could restart the device
> state if the device continues to change (though it's debatable whether
> triggering an -errno on the next migration region access might be a
> more supportable approach to enforce that userspace has quiesced
> external access).

Hm, not so sure. From my reading of the uAPI, transitioning from
pre-copy to stop-and-copy (i.e. clearing _RUNNING) implies that we
freeze the device (at least, that's how I interpret "On state transition
from pre-copy to stop-and-copy, the driver must stop the device, save
the device state and send it to the user application through the
migration region.")

Maybe the uAPI is simply not yet clear enough.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux