On Fri, 5 Nov 2021 10:24:04 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, Nov 03, 2021 at 12:04:11PM -0600, Alex Williamson wrote: > > > We agreed that it's easier to add a feature than a restriction in a > > uAPI, so how do we resolve that some future device may require a new > > state in order to apply the SET_IRQS configuration? > > I would say don't support those devices. If there is even a hint that > they could maybe exist then we should fix it now. Once the uapi is set > and documented we should expect device makers to consider it when > building their devices. > > As for SET_IRQs, I have been looking at making documentation and I > don't like the way the documentation has to be wrriten because of > this. > > What I see as an understandable, clear, documentation is: > > - SAVING set - no device touches allowed beyond migration operations > and reset via XX I'd suggest defining reset via ioctl only. > Must be set with !RUNNING Not sure what this means. Pre-copy requires SAVING and RUNNING together, is this only suggesting that to get the final device state we need to do so in a !RUNNING state? > - RESUMING set - same as SAVING I take it then that we're defining a new protocol if we can't do SET_IRQS here. > - RUNNING cleared - limited device touches in this list: SET_IRQs, XX > config, XX. > Device may assume no touches outside the above. (ie no MMIO) > Implies NDMA SET_IRQS is MMIO, is the distinction userspace vs kernel? > - NDMA set - full device touches > Device may not issue DMA or interrupts (??) > Device may not dirty pages Is this achievable? We can't bound the time where incoming DMA is possible, devices don't have infinite buffers. > - RUNNING set - full functionality > * In no state may a device generate an error TLP, device > hang/integrity failure or kernel intergity failure, no matter > what userspace does. > The device is permitted to corrupt the migration/VM or SEGV > userspace if userspace doesn't follow the rules. > > (we are trying to figure out what the XX's are right now, would > appreciate any help) > > This is something I think we could expect a HW engineering team to > follow and implement in devices. It doesn't complicate things. > > Overall, at this moment, I would prioritize documentation clarity over > strict compatability with qemu, because people have to follow this > documentation and make their devices long into the future. If the > documentation is convoluted for compatibility reasons HW people are > more likely to get it wrong. When HW people get it wrong they are more > likely to ask for "quirks" in the uAPI to fix their mistakes. I might still suggest a v2 migration sub-type, we'll just immediately deprecate the original as we have no users and QEMU would modify all support to find only the new sub-type as code is updated. "v1" never really materialized, but we can avoid future confusion if it's never produced by in-tree drivers and never consume by mainstream userspace. > The pending_bytes P2P idea is also quite complicated to document as > now we have to describe an HW state not in terms of a NDMA control > bit, but in terms of a bunch of implicit operations in a protocol. Not > so nice. > > So, here is what I propose. Let us work on some documentation and come > up with the sort of HW centric docs like above and we can then decide > if we want to make the qemu changes it will imply, or not. We'll > include the P2P stuff, as we see it, so it shows a whole picture. > > I think that will help everyone participate fully in the discussion. Good plan. > > If we're going to move forward with the existing uAPI, then we're going > > to need to start factoring compatibility into our discussions of > > missing states and protocols. For example, requiring that the device > > is "quiesced" when the _RUNNING bit is cleared and "frozen" when > > pending_bytes is read has certain compatibility advantages versus > > defining a new state bit. > > Not entirely, to support P2P going from RESUMING directly to RUNNING > is not possible. There must be an in between state that all devices > reach before they go to RUNNING. It seems P2P cannot be bolted into > the existing qmeu flow with a kernel only change? Perhaps, yes. > > clarifications were trying for within the existing uAPI rather than > > toss out new device states and protocols at every turn for the sake of > > API purity. The rate at which we're proposing new states and required > > transitions without a plan for the uAPI is not where I want to be for > > adding the driver that could lock us in to a supported uAPI. Thanks, > > Well, to be fair, the other cases I suggested new stats was when you > asked about features we don't have at all today (like post-copy). I > think adding new states is a very reasonable way to approach adding > new features. As long as new features can be supported with new states > we have a forward compatability story. That has a viable upgrade path, I'm onboard with that. A device that imposes it can't do SET_IRQS while RESUMING when we have no required state in between RESUMING and RUNNING are the sorts of issues that I'm going to get hung up on. I take it from the above that you're building that state transition requirement into the uAPI now. Thanks, Alex