RE: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Tuesday, January 11, 2022 1:34 AM
> 
> On Mon, 10 Jan 2022 07:55:16 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Friday, January 7, 2022 5:21 AM
> > >
> > > We were also thinking to retain STOP. SAVING -> STOP could possibly
> > > serve as the abort path to avoid a double action, and some of the use
> > > cases you ID'd below are achievable if STOP remains.
> >
> > what is the exact difference between a null state {} (implying !RUNNING)
> > and STOP in this context?
> >
> > If they are different, who (user or driver) should conduct and when do
> > we expect transitioning the device into a null state?
> 
> Sorry if I added confusion here, the null, ie. {}, state fit my
> notation better.  The null state is simply no bits set in device_state,
> it's equivalent to "STOP" without coming up with a new name for every
> set of bit combinations.

This matches my thought. Earlier reading this thread I was left with
the impression that Jason means 'STOP' as a new state.

> 
> > > > We have 20 possible transitions.  I've marked those available via the
> > > > "odd ascii art" diagram as (a), that's 7 transitions.  We could
> > > > essentially remove the NULL state as unreachable as there seems little
> > > > value in the 2 transitions marked (a)* if we look only at migration,
> > > > that would bring us down to 5 of 12 possible transitions.  We need to
> > > > give userspace an abort path though, so we minimally need the 2
> > > > transitions marked (b) (7/12).
> > >
> > > > So now we can discuss the remaining 5 transitions:
> > > >
> > > > {SAVING} -> {RESUMING}
> > > > 	If not supported, user can achieve this via:
> > > > 		{SAVING}->{RUNNING}->{RESUMING}
> > > > 		{SAVING}-RESET->{RUNNING}->{RESUMING}
> > >
> > > This can be:
> > >
> > > SAVING -> STOP -> RESUMING
> >
> > From Alex's original description the default device state is RUNNING.
> > This supposed to be the initial state on the dest machine for the
> > device assigned to Qemu before Qemu resumes the device state.
> > Then how do we eliminate the RUNNING state in above flow? Who
> > makes STOP as the initial state on the dest node?
> 
> The device must be RUNNING by default.  This is a requirement that
> introduction of migration support for a device cannot break
> compatibility with existing userspace that may no support migration
> features.  It would be QEMU's responsibility to transition a migration
> target device from the default state to a state to accept a migration.
> There's no discussion here of eliminating the {RUNNING}->{RESUMING}
> transition.

Then having STOP in the flow is meaningless. It actually means:

{SAVING}-RESET->{RUNNING}->{STOP}->{RESUMING}

> 
> > > > drivers follow the previously provided pseudo algorithm with the
> > > > requirement that they cannot pass through an invalid state, we need to
> > > > formally address whether the NULL state is invalid or just not
> > > > reachable by the user.
> > >
> > > What is a NULL state?
> >
> > Hah, seems I'm not the only one having this confusion. 😊
> 
> Sorry again, I thought it could easily be deduced by including it in
> the state transitions.
> 
> > > We have defined (from memory, forgive me I don't have access to
> > > Yishai's latest code at the moment) 8 formal FSM states:
> > >
> > >  RUNNING
> > >  PRECOPY
> > >  PRECOPY_NDMA
> > >  STOP_COPY
> > >  STOP
> > >  RESUMING
> > >  RESUMING_NDMA
> > >  ERROR (perhaps MUST_RESET)
> >
> > ERROR->SHUTDOWN? Usually a shutdown state implies reset required...
> 
> The userspace process can go away at any time, what exactly is a
> "SHUTDOWN" state representing?

I suppose closing fd for previous process transitions the device state
to NULL or STOP state from whatever current state is. Then when 
another process opens the same device its state becomes RUNNING.

All other states including SHUTDOWN only have effect when the
device fd is opened.

> 
> > > > But I think you've identified two classes of DMA, MSI and everything
> > > > else.  The device can assume that an MSI is special and not included in
> > > > NDMA, but it can't know whether other arbitrary DMAs are p2p or
> memory.
> > > > If we define that the minimum requirement for multi-device migration
> is
> > > > to quiesce p2p DMA, ex. by not allowing it at all, then NDMA is
> > > > actually significantly more restrictive while it's enabled.
> > >
> > > You are right, but in any practical physical device NDMA will be
> > > implemented by halting all DMAs, not just p2p ones.
> > >
> > > I don't mind what we label this, so long as we understand that halting
> > > all DMA is a valid device implementation.
> > >
> > > Actually, having reflected on this now, one of the things on my list
> > > to fix in iommufd, is that mdevs can get access to P2P pages at all.
> > >
> > > This is currently buggy as-is because they cannot DMA map these
> > > things, touch them with the CPU and kmap, or do, really, anything with
> > > them.
> >
> > Can you elaborate why mdev cannot access p2p pages?
> >
> > >
> > > So we should be blocking mdev's from accessing P2P, and in that case a
> > > mdev driver can quite rightly say it doesn't support P2P at all and
> > > safely NOP NDMA if NDMA is defined to only impact P2P transactions.
> > >
> > > Perhaps that answers the question for the S390 drivers as well.
> > >
> > > > Should a device in the ERROR state continue operation or be in a
> > > > quiesced state?  It seems obvious to me that since the ERROR state is
> > > > essentially undefined, the device should cease operations and be
> > > > quiesced by the driver.  If the device is continuing to operate in the
> > > > previous state, why would the driver place the device in the ERROR
> > > > state?  It should have returned an errno and left the device in the
> > > > previous state.
> > >
> > > What we found while implementing is the use of ERROR arises when the
> > > driver has been forced to do multiple actions and is unable to fully
> > > unwind the state. So the device_state is not fully the original state
> > > or fully the target state. Thus it is ERROR.
> > >
> > > The additional requirement that the driver do another action to
> > > quiesce the device only introduces the possiblity for triple failure.
> > >
> > > Since it doesn't bring any value to userspace, I prefer we not define
> > > things in this complicated way.
> >
> > So ERROR is really about unrecoverable failures. If recoverable suppose
> > errno should have been returned and the device is still in the original
> > state. Is this understanding correct?
> 
> Yes, that's how I understand it.  The ERROR state should be used if the
> driver can neither provide the requested new state nor remain/return to
> the old state.
> 
> > btw which errno indicates to the user that the device is back to the
> > original state or in the ERROR state? or want the user to always check
> > the device state upon any transition error?
> 
> No such encoding is defined or required, per the existing uAPI the user
> is to re-evaluate device_state on any non-zero errno.  Thanks,
> 

Makes sense.

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux