On Mon, 10 Jan 2022 07:55:16 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, January 7, 2022 5:21 AM > > > > We were also thinking to retain STOP. SAVING -> STOP could possibly > > serve as the abort path to avoid a double action, and some of the use > > cases you ID'd below are achievable if STOP remains. > > what is the exact difference between a null state {} (implying !RUNNING) > and STOP in this context? > > If they are different, who (user or driver) should conduct and when do > we expect transitioning the device into a null state? Sorry if I added confusion here, the null, ie. {}, state fit my notation better. The null state is simply no bits set in device_state, it's equivalent to "STOP" without coming up with a new name for every set of bit combinations. > > > We have 20 possible transitions. I've marked those available via the > > > "odd ascii art" diagram as (a), that's 7 transitions. We could > > > essentially remove the NULL state as unreachable as there seems little > > > value in the 2 transitions marked (a)* if we look only at migration, > > > that would bring us down to 5 of 12 possible transitions. We need to > > > give userspace an abort path though, so we minimally need the 2 > > > transitions marked (b) (7/12). > > > > > So now we can discuss the remaining 5 transitions: > > > > > > {SAVING} -> {RESUMING} > > > If not supported, user can achieve this via: > > > {SAVING}->{RUNNING}->{RESUMING} > > > {SAVING}-RESET->{RUNNING}->{RESUMING} > > > > This can be: > > > > SAVING -> STOP -> RESUMING > > From Alex's original description the default device state is RUNNING. > This supposed to be the initial state on the dest machine for the > device assigned to Qemu before Qemu resumes the device state. > Then how do we eliminate the RUNNING state in above flow? Who > makes STOP as the initial state on the dest node? The device must be RUNNING by default. This is a requirement that introduction of migration support for a device cannot break compatibility with existing userspace that may no support migration features. It would be QEMU's responsibility to transition a migration target device from the default state to a state to accept a migration. There's no discussion here of eliminating the {RUNNING}->{RESUMING} transition. > > > drivers follow the previously provided pseudo algorithm with the > > > requirement that they cannot pass through an invalid state, we need to > > > formally address whether the NULL state is invalid or just not > > > reachable by the user. > > > > What is a NULL state? > > Hah, seems I'm not the only one having this confusion. 😊 Sorry again, I thought it could easily be deduced by including it in the state transitions. > > We have defined (from memory, forgive me I don't have access to > > Yishai's latest code at the moment) 8 formal FSM states: > > > > RUNNING > > PRECOPY > > PRECOPY_NDMA > > STOP_COPY > > STOP > > RESUMING > > RESUMING_NDMA > > ERROR (perhaps MUST_RESET) > > ERROR->SHUTDOWN? Usually a shutdown state implies reset required... The userspace process can go away at any time, what exactly is a "SHUTDOWN" state representing? > > > But I think you've identified two classes of DMA, MSI and everything > > > else. The device can assume that an MSI is special and not included in > > > NDMA, but it can't know whether other arbitrary DMAs are p2p or memory. > > > If we define that the minimum requirement for multi-device migration is > > > to quiesce p2p DMA, ex. by not allowing it at all, then NDMA is > > > actually significantly more restrictive while it's enabled. > > > > You are right, but in any practical physical device NDMA will be > > implemented by halting all DMAs, not just p2p ones. > > > > I don't mind what we label this, so long as we understand that halting > > all DMA is a valid device implementation. > > > > Actually, having reflected on this now, one of the things on my list > > to fix in iommufd, is that mdevs can get access to P2P pages at all. > > > > This is currently buggy as-is because they cannot DMA map these > > things, touch them with the CPU and kmap, or do, really, anything with > > them. > > Can you elaborate why mdev cannot access p2p pages? > > > > > So we should be blocking mdev's from accessing P2P, and in that case a > > mdev driver can quite rightly say it doesn't support P2P at all and > > safely NOP NDMA if NDMA is defined to only impact P2P transactions. > > > > Perhaps that answers the question for the S390 drivers as well. > > > > > Should a device in the ERROR state continue operation or be in a > > > quiesced state? It seems obvious to me that since the ERROR state is > > > essentially undefined, the device should cease operations and be > > > quiesced by the driver. If the device is continuing to operate in the > > > previous state, why would the driver place the device in the ERROR > > > state? It should have returned an errno and left the device in the > > > previous state. > > > > What we found while implementing is the use of ERROR arises when the > > driver has been forced to do multiple actions and is unable to fully > > unwind the state. So the device_state is not fully the original state > > or fully the target state. Thus it is ERROR. > > > > The additional requirement that the driver do another action to > > quiesce the device only introduces the possiblity for triple failure. > > > > Since it doesn't bring any value to userspace, I prefer we not define > > things in this complicated way. > > So ERROR is really about unrecoverable failures. If recoverable suppose > errno should have been returned and the device is still in the original > state. Is this understanding correct? Yes, that's how I understand it. The ERROR state should be used if the driver can neither provide the requested new state nor remain/return to the old state. > btw which errno indicates to the user that the device is back to the > original state or in the ERROR state? or want the user to always check > the device state upon any transition error? No such encoding is defined or required, per the existing uAPI the user is to re-evaluate device_state on any non-zero errno. Thanks, Alex