On Wed, 3 Nov 2021 13:10:19 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, Nov 03, 2021 at 09:44:09AM -0600, Alex Williamson wrote: > > > In one email I read that QEMU clearly should not be performing SET_IRQS > > while the device is _RESUMING (which it does) and we need to require an > > interim state before the device becomes _RUNNING to poke at the device > > (which QEMU doesn't do and the uAPI doesn't require), and the next I > > read that we should proceed with some useful quanta of work despite > > that we clearly don't intend to retain much of the protocol of the > > current uAPI long term... > > mlx5 implements the protocol as is today, in a way that is compatible > with today's qemu. Qemu has various problems like the P2P issue we > talked about, but it is something working. > > If you want to do a full re-review of the protocol and make changes, > then fine, let's do that, but everything should be on the table, and > changing qemu shouldn't be a blocker. I don't think changing QEMU is a blocker, but QEMU should be seen as the closest thing we currently have to a reference user implementation against the uAPI and therefore may define de facto behaviors that are not sufficiently clear in the uAPI. So if we see issues with the QEMU implementation, that's a reflection on gaps and disagreements in the uAPI itself. If we think we need new device states and protocols to handle the issues being raised, we need plans to incrementally add those to the uAPI, otherwise we should halt and reevaluate the existing uAPI for a full overhaul. We agreed that it's easier to add a feature than a restriction in a uAPI, so how do we resolve that some future device may require a new state in order to apply the SET_IRQS configuration? Existing userspace would fail with such a device. > In one email you are are saying we need to document and decide things > as a pre-condition to move the driver forward, and then in the next > email you say whatever qemu does is the specification, and can't > change it. I don't think I ever said we can't change it. I'm being presented with new information, new requirements, new protocols that existing QEMU code does not follow. We can change QEMU, but as I noted before we're getting dangerously close to having a formal, non-experimental user while we're poking holes in the uAPI and we need to consider how the uAPI extends to fill those holes and remains backwards compatible to the current implementation. > Part of this messy discussion is my fault as I've been a little > unclear in mixing my "community view" of how the protocol should be > designed to maximize future HW support and then switching to topics > that have direct relevance to mlx5 itself. Better sooner than later to evaluate the limitations and compatibility issues against what we think is reasonable hardware behavior with respect to migration states and transitions. > I want to see devices like hns be supportable and, from experience, > I'm very skeptical about placing HW design restrictions into a > uAPI. So I don't like those things. > > However, mlx5's HW is robust and more functional than hns, and doesn't > care which way things are decided. Regardless, the issues are already out on the table. We want migration for mlx5, but we also want it to be as reasonably close to what we think can support any device designed for this use case. You seem to have far more visibility into that than I do. > > Too much is in flux and we're only getting breadcrumbs of the > > changes to come. > > We have no intention to go in and change the uapi after merging beyond > solving the P2P issue. Then I'm confused where we're at with the notion that we shouldn't be calling SET_IRQS while in the _RESUMING state. > Since we now have confirmation that hns cannot do P2P I see no issue > to keep the current design as the non-p2p baseline that hns will > implement and the P2P upgrade should be designed separately. > > > It's becoming more evident that we're likely to sufficiently modify > > the uAPI to the point where I'd probably suggest a new "v2" subtype > > for the region. > > I don't think this is evident. It is really your/community choice what > to do in VFIO. > > If vfio sticks with the uAPI "as is" then it places additional > requirements on future HW designs. > > If you want to relax these requirements before stabilizing the uAPI, > then we need to make those changes now. > > It is your decision. I don't know of any upcoming HW designs that have > a problem with any of the choices. If we're going to move forward with the existing uAPI, then we're going to need to start factoring compatibility into our discussions of missing states and protocols. For example, requiring that the device is "quiesced" when the _RUNNING bit is cleared and "frozen" when pending_bytes is read has certain compatibility advantages versus defining a new state bit. Likewise, it might be fair to define that userspace should not touch device MMIO during _RESUMING until after the last bit of the device migration stream has been written, and then it's free to touch MMIO before transitioning directly to the _RUNNING state. IOW, we at least need to entertain methods to achieve the clarifications were trying for within the existing uAPI rather than toss out new device states and protocols at every turn for the sake of API purity. The rate at which we're proposing new states and required transitions without a plan for the uAPI is not where I want to be for adding the driver that could lock us in to a supported uAPI. Thanks, Alex