Re: vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 17, 2021 at 05:42:58PM +0100, Cornelia Huck wrote:
> Ok, here's the contents (as of 2021-11-17 16:30 UTC) of the etherpad at
> https://etherpad.opendev.org/p/VFIOMigrationDiscussions -- in the hope
> of providing a better starting point for further discussion (I know that
> discussions are still ongoing in other parts of this thread; but
> frankly, I'm getting a headache trying to follow them, and I think it
> would be beneficial to concentrate on the fundamental questions
> first...)

In my mind several of these topics now have answers:

>       * Jason proposed a new NDMA (no-dma) state that seems to match the

NDMA solves the PRI problem too, and allows dirty tracking to be
iterative. So yes to adding to device_state vs implicit via !RUNNING

>     * No definition of what HW needs to preserve when RESUMING toggles
>     off - (eg today SET_IRQS must work, what else?).

Everything in the device controlled by other kernel subystems (IRQs,
MSI, PCI config space) must continue to work across !RUNNING and must
not be disturbed by the migration driver during RESUME.

So, clear yes that SET_IRQs during !RUNNING must be supported

>     * In general, what operations or accesses is the user restricted
>     from performing on the device while !RUNNING

Still a need on this other than the carve out for above. HNS won't
work without restrictions, for instance.

>     * PRI into the guest (guest user process SVA) has a sequencing
>     problem with RUNNING - can not migrate a vIOMMU in the middle of a
>     page fault, must stop and flush faults before stopping vCPUs

NDMA|RUNNING allows to suspend the vIOMMU

> The uAPI could benefit from some more detailed documentation
> (e.g. how to use it, what to do in edge cases, ...) outside of the
> header file.

We have an internal draft of this now

> Trying to use the mlx5 support currently on the list has unearthed
> some problems in QEMU <please summarize :)>

If the kernel does anything odd qemu does abort()

Performance is bad, Yishai sent a patch

Jason



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux